Polypeptides for use in self-assembling protein nanostructures

ABSTRACT

Synthetic nanostructures, polypeptides that are useful, for example, in making synthetic nanostructures, and methods for using such synthetic nanostructures are disclosed herein.

CROSS REFERENCE

This application claims priority to U.S. Provisional Patent ApplicationSer. No. 62/074,167 filed Nov. 3, 2014, incorporated by reference hereinin its entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with U.S. government support under CHE-1332907,awarded by the National Science Foundation, and DGE-0718124, awarded bythe National Science Foundation. The U.S. Government has certain rightsin the invention.

BACKGROUND

Molecular self- and co-assembly of proteins into highly ordered,symmetric supramolecular complexes is an elegant and powerful means ofpatterning matter at the atomic scale. Recent years have seen advancesin the development of self-assembling biomaterials, particularly thosecomposed of nucleic acids. DNA has been used to create, for example,nanoscale shapes and patterns, molecular containers, andthree-dimensional macroscopic crystals.

Methods for designing self-assembling proteins have progressed moreslowly, yet the functional and physical properties of proteins make themattractive as building blocks for the development of advanced functionalmaterials.

SUMMARY OF THE INVENTION

In a first aspect, the invention provides isolated polypeptidescomprising an amino acid sequence that is at least 75% identical overits length, and identical at least at one identified interface position,to the amino acid sequence of a polypeptide selected from the groupconsisting of SEQ ID NOS:1-34.

In a second aspect, the invention provides nanostructures, comprising:

(a) a plurality of first assemblies, each first assembly comprising aplurality of identical first polypeptides, wherein the firstpolypeptides comprise the polypeptide of claim 1; and

(b) a plurality of second assemblies, each second assembly comprising aplurality of identical second polypeptides, wherein the secondpolypeptides comprise the polypeptide of claim 1, and wherein the secondpolypeptide differs from the first polypeptide;

wherein the plurality of first assemblies non-covalently interact withthe plurality of second assemblies to form a nanostructure.

In another aspect, the present invention provides isolated nucleic acidsencoding the polypeptides of the invention. In a further aspect, theinvention provides nucleic acid expression vectors comprising isolatednucleic acids of the invention. In another aspect, the present inventionprovides recombinant host cells, comprising a nucleic acid expressionvector according to the invention.

In a further aspect, the present invention provides a kit, comprisingone or more isolated nanostructures of the invention; one or more of theisolated proteins of the present invention or the assemblies of thepresent invention; one or more recombinant nucleic acids of the presentinvention; one or more recombinant expression vectors of the presentinvention; and/or one or more recombinant host cells of the presentinvention.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing aspects and many of the attendant advantages of thisinvention will become more readily appreciated as the same become betterunderstood by reference to the following detailed description, whentaken in conjunction with the accompanying drawings.

FIG. 1 . Overview of the design method utilized to produce the exemplarynanostructures and sequences, illustrated with the I53 icosahedralarchitecture. (A) A schematic illustration of icosahedral symmetryoutlined with dashed lines, with the five-fold symmetry axes shown goingthrough each vertex and three-fold symmetry axes going through each faceof the icosahedron. (B) 12 pentamers (dark grey) and 20 trimers (lightgrey) are aligned along the 5-fold and 3-fold symmetry axes,respectively. Each oligomer possesses two rigid body degrees of freedom,one translational (r) and one rotational (w) that are systematicallysampled to identify configurations with large interfaces and highdensities of contacting residues suitable for protein-protein interfacedesign. (C) Example of such a docked configuration with a largeinterface and high density of contacting residues suitable forprotein-protein interface design. (D) Close-up of the docked interfacebetween the pentameric and trimeric subunits, as outlined in panel C.Side chains atoms beyond the beta carbon are ignored at this stage ofdesign. (E) New amino acid sequences are designed at the interface tostabilize the modeled configuration.

FIG. 2 . Design models of exemplary nanostructures. Computational modelsof the 11 exemplary nanomaterials, (A) I53-34, (B) I53-40, (C) I53-47,(D) I53-50, (E) I53-51, (F) I52-03, (G) I52-32, (H) I52-33, (I)I32-06,(J) I32-19, and (K) I32-28, are shown to scale (relative to the 30 nmscale bar), viewed down one of the 5-fold icosahedral symmetry axes withribbon-style rendering of the protein backbone. Each I53 materialcomprises 12 identical pentamers (dark grey) and 20 identical trimers(light grey), each I52 material comprises 12 identical pentamers (darkgrey) and 30 identical dimers (light grey), and each I32 materialcomprises 20 identical trimers (dark grey) and 30 identical dimers(light grey), with the designed interface formed between theseoligomeric building blocks. All renderings were generated using PyMOL®Schrödinger, LLC.

FIG. 3 . Sodium dodecyl sulfate polyacrylamide electrophoresis(SDS-PAGE) and mass spectrometry analysis. Pairs of proteins encodingeach material were co-expressed (as described in the Methods ofProduction) in E. coli, lysed, and purified via nickel-affinitychromatography followed by gel filtration with a Superose® 6 10/300 GLcolumn (GE Life Sciences). (A) The resulting samples were subjected toSDS-PAGE followed by staining with GelCode® Blue Stain Reagent (PierceBiotechnology, Inc.). The left lane in each panel contains proteinmolecular weight standards; the approximate molecular weights inkilodaltons are indicated directly to the left of each band. The rightlanes in each panel contain the purified samples. For all of thematerials except I52-03, clear bands, of similar staining intensity andnear the expected molecular weights of each protein subunit, are presentfor each of the two proteins comprising the purified materials. (B)While only one band (near the expected molecular weight of 27 kDa forthe dimer subunit) is clearly distinguishable for I52-03 via SDS-PAGE,mass spectrometry analysis shows that the other protein subunit is alsopresent in the sample; the mass spectrometry peak at 21,029 Da matchesclosely with the expected molecular weight of 21,026 Da for the pentamersubunit with loss of the initiator methionine, a commonpost-translational modification.

FIG. 4 . Negative stain averages. Averages have been obtained of the (A)I53-40, (B) I53-50, (C) I52-03, and (D) I32-06 nanostructures and foundto match well with the design models. Raw negative stain micrographsfrom which the averages were generated are shown on the left side ofeach panel. Averages (left), along with renderings from the designmodels (right), are shown on the right side of each panel. Views areshown corresponding approximately to the 5-fold, 3-fold, and 2-foldsymmetry axes.

FIG. 5 . X-ray crystallography. X-ray crystal structures (bottom)ranging from 3.5 to 5.0 Å resolution have been obtained for three of thedesigned materials, (A) I53-40, (B) I52-32, and (C) I32-28, and found tomatch closely with the design models (top). Each structure is shownusing a ribbon-style rendering. Views of the I53 and I52 designs andcrystal structures (panels A and B) are shown looking down one of the5-fold symmetry axes, while the I32 design model and crystal structure(panel C) are shown looking down one of the 3-fold symmetry axes. Eachcrystal structure contains only a portion of the full icosahedron in theasymmetric unit. Crystal lattice symmetry was applied to generate thefull icosahedra shown in the bottom panel. The I53-40 design model andcrystal structure (panel A) comprise 12 pentamers (dark grey) and 20trimers (light grey), while the I52-32 design model and crystalstructure (panel B) comprise 12 pentamers (dark grey) and 30 dimers(light grey), and the I32-28 design model and crystal structure (panelC) comprise 20 trimers (dark grey) and 30 dimers (light grey). Allrenderings were generated using PyMOL® Schrödinger, LLC.

FIG. 6 . In vitro assembly of I53-50A.1PosT1+I53-50B.4PosT1 in thepresence of 400 nucleotide (nt) ssDNA leads to encapsulation andprotection of the ssDNA. Mixtures of 26 ng/μL ssDNA and various proteinswere analyzed by agarose gel electrophoretic mobility shift assay (EMSA)after incubation for 16 hours to determine the ability of mixtures ofI53-50A.1PosT1+I53-50B.4PosT1 to encapsulate the ssDNA (left; the upperimage of the gel is after staining for DNA, while the lower image of thegel is after staining for protein). Mixtures of both components (laneslabeled “Components titration” are mixtures ofI53-50A.1PosT1+I53-50B.4PosT1 at 2, 4, 6, 8, 10 and 12 μM) with the DNAshift the DNA such that it migrates similarly to SEC-purifiedI53-50A.1PosT1+I53-50B.4PosT1 nanoparticles (upper band), while mixturesof DNA with only one protein component or the other do not. The mixtureswere then incubated with 25 μg/mL DNase I for 1 hour at room temperaturein order to evaluate the ability of the in vitro-assembled nanoparticlesto protect the ssDNA cargo from degradation (right; the upper image ofthe gel is after staining for DNA, while the lower image of the gel isafter staining for protein). The DNA that co-migrates with the proteinin mixtures of both components (I53-50A.1PosT1+I53-50B.4PosT1; laneslabeled “Components titration” are mixtures at 2, 4, 6, 8, 10 and 12 μM)is largely protected from DNase challenge, while free ssDNA and themixture of ssDNA+I53-50B.4PosT1 are not. The mixture ofssDNA+I53-50A.1PosT1 is weakly protected, but migrates as a diffusesmear on the gel. Overall, the data show that the ssDNA is encapsulatedin nanoparticles formed by I53-50A.1PosT1+I53-50B.4PosT1, which forms abarrier that prevents degradation of the ssDNA by DNase.

FIG. 7 . In vitro assembly of I53-50A.1PosT1+I53-50B.4PosT1 in thepresence of 1600 nucleotide (nt) ssDNA leads to encapsulation andprotection of the ssDNA. Mixtures of 35.2 ng/μL ssDNA and variousproteins were analyzed by agarose electrophoretic mobility shift assay(EMSA) after incubation for 16 hours to determine the ability ofmixtures of I53-50A.1PosT1+I53-50B.4PosT1 to encapsulate the ssDNA(left; the upper image of the gel is after staining for DNA, while thelower image of the gel is after staining for protein). Mixtures of bothcomponents (lanes labeled “Components titration” are mixtures ofI53-50A.1PosT1+153-50B.4PosT1 at 2, 4, 6, 8, 10 and 12 μM) with the DNAshift the DNA such that it migrates similarly to SEC-purifiedI53-50A.1PosT1+I53-50B.4PosT1 nanoparticles (upper band), while mixturesof DNA with only one protein component or the other do not. The mixtureswere then incubated with 25 μg/mL DNase I for 1 hour at room temperaturein order to evaluate the ability of the in vitro-assembled nanoparticlesto protect the ssDNA cargo from degradation (right; the upper image ofthe gel is after staining for DNA, while the lower image of the gel isafter staining for protein). The DNA that co-migrates with the proteinin mixtures of both components (I53-50A.1PosT1+I53-50B.4PosT1; laneslabeled “Components titration” are mixtures at 2, 4, 6, 8, 10 and 12 μM)is largely protected from DNase challenge, while free ssDNA and themixture of ssDNA+I53-50B.4PosT1 are not. The mixture ofssDNA+I53-50A.1PosT1 is weakly protected, but migrates as a diffusesmear on the gel. Overall, the data show that the ssDNA is encapsulatedin nanoparticles formed by I53-50A.1PosT1+I53-50B.4PosT1, which forms abarrier that prevents degradation of the ssDNA by DNase.

DETAILED DESCRIPTION

All references cited are herein incorporated by reference in theirentirety. Within this application, unless otherwise stated, thetechniques utilized may be found in any of several well-known referencessuch as: Molecular Cloning: A Laboratory Manual (Sambrook, et al., 1989,Cold Spring Harbor Laboratory Press), Gene Expression Technology(Methods in Enzymology, Vol. 185, edited by D. Goeddel, 1991. AcademicPress, San Diego, Calif.), “Guide to Protein Purification” in Methods inEnzymology (M. P. Deutshcer, ed., (1990) Academic Press, Inc.); PCRProtocols: A Guide to Methods and Applications (Innis, et al. 1990.Academic Press, San Diego, Calif.), Culture of Animal Cells: A Manual ofBasic Technique, 2^(nd) Ed. (R. I. Freshney. 1987. Liss, Inc. New York,N.Y.), Gene Transfer and Expression Protocols, pp. 109-128, ed. E. J.Murray, The Humana Press Inc., Clifton, N.J.), and the Ambion 1998Catalog (Ambion, Austin, Tex.).

As used herein, the singular forms “a”, “an” and “the” include pluralreferents unless the context clearly dictates otherwise. “And” as usedherein is interchangeably used with “or” unless expressly statedotherwise.

As used herein, the amino acid residues are abbreviated as follows:alanine (Ala; A), asparagine (Asn; N), aspartic acid (Asp; D), arginine(Arg; R), cysteine (Cys; C), glutamic acid (Glu; E), glutamine (Gln; Q),glycine (Gly; G), histidine (His; H), isoleucine (Ile; I), leucine (Leu;L), lysine (Lys; K), methionine (Met; M), phenylalanine (Phe; F),proline (Pro; P), serine (Ser; S), threonine (Thr; T), tryptophan (Trp;W), tyrosine (Tyr; Y), and valine (Val; V). As used herein, “about”means +/−5% of the recited parameter.

All embodiments of any aspect of the invention can be used incombination, unless the context clearly dictates otherwise.

Unless the context clearly requires otherwise, throughout thedescription and the claims, the words ‘comprise’, ‘comprising’, and thelike are to be construed in an inclusive sense as opposed to anexclusive or exhaustive sense; that is to say, in the sense of“including, but not limited to”. Words using the singular or pluralnumber also include the plural and singular number, respectively.Additionally, the words “herein,” “above,” and “below” and words ofsimilar import, when used in this application, shall refer to thisapplication as a whole and not to any particular portions of theapplication.

The description of embodiments of the disclosure is not intended to beexhaustive or to limit the disclosure to the precise form disclosed.While the specific embodiments of, and examples for, the disclosure aredescribed herein for illustrative purposes, various equivalentmodifications are possible within the scope of the disclosure, as thoseskilled in the relevant art will recognize.

In a first aspect, the invention provides isolated polypeptidecomprising an amino acid sequence that is at least 75% identical overits length, and identical at least at one identified interface position,to the amino acid sequence of a polypeptide selected from the groupconsisting of SEQ ID NOS: 1-34. The isolated polypeptides of theinvention can be used, for example, to prepare the nanostructures of theinvention. As described in the examples that follow, the polypeptides ofthe invention were designed for their ability to self-assemble in pairsto form nanostructures, such as icosahedral nanostructures. The designinvolved design of suitable interface residues for each member of thepolypeptide pair that can be assembled to form the nanostructure. Thenanostructures of the invention include symmetrically repeated,non-natural, non-covalent polypeptide-polypeptide interfaces that orienta first assembly and a second assembly into a nanostructure, such as onewith an icosahedral symmetry. Starting proteins were those derived frompentameric, trimeric, and dimeric crystal structures from the ProteinData Bank (PDB), along with a small number of crystal structures of denovo designed proteins not yet deposited in the PDB. Thus, each of thepolypeptides of the present invention includes one or more modificationsat “interface residues” compared to the starting proteins, permittingthe polypeptides of the invention to, for example, form icosahedralnanostructures as described herein. Table 1 provides the amino acidsequence of exemplary polypeptides of the invention; the right handcolumn in Table 1 identifies the residue numbers in each exemplarypolypeptide that were identified as present at the interface ofresulting assembled nanostructures (i.e.: “identified interfaceresidues”). As can be seen, the number of interface residues for theexemplary polypeptides of SEQ ID NO:1-34 range from 4-13. In variousembodiments, the isolated polypeptides of the invention comprise anamino acid sequence that is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%,94%, 95%, 96%, 97%, 98%, or 99% identical over its length, and identicalat least at 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 identifiedinterface positions (depending on the number of interface residues for agiven polypeptide), to the amino acid sequence of a polypeptide selectedfrom the group consisting of SEQ ID NOS: 1-34. In other embodiments, theisolated polypeptides of the invention comprise an amino acid sequencethat is at least 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%,98%, or 99% identical over its length, and identical at least at 20%,25%, 33%, 40%, 50%, 60%, 70%, 75%, 80%, 90%, or 100% of the identifiedinterface positions, to the amino acid sequence of a polypeptideselected from the group consisting of SEQ ID NOS: 1-34. In furtherembodiments, the polypeptides of the invention comprise or consist of apolypeptide having the amino acid sequence of a polypeptide selectedfrom the group consisting of SEQ ID NOS:1-40.

TABLE 1 Name Amino Acid Sequence Identified interface residues I53-34AMEGMDPLAVLAESRLLPLLTVRGGEDLAGLATVLELMGVGALEITL I53-34A: SEQ IDRTEKGLEALKALRKSGLLLGAGTVRSPKEAEAALEAGAAFLVSPGL 28, 32, 36, 37, NO: 1LEEVAALAQARGVPYLPGVLTPIEVERALALGLSALKFFPAEPFQG 186, 188, 191, 192, 195VRVLRAYAEVFPEVRFLPTGGIKEEHLPHYAALPNLLAVGGSWLLQ GDLAAVMKKVKAAKALLSPQAPGI53-34B MTKKVGIVDTTFARVDMAEAAIRTLKALSPNIKIIRKTVPGIKDLPV I53-34B:  SEQ IDACKKLLEEEGCDIVMALGMPGKAEKDKVCAHEASLGLMLAQLMT19, 20, 23, 24, 27, 109, 113, 116,   NO: 2NKHIIEVFVHEDEAKDDDELDILALVRAIEHAANVYYLLFKPEYLTR 117, 120, 124, 148MAGKGLRQGREDAGPARE I53-40AMTKKVGIVDTTFARVDMASAAILTLKMESPNIKIIRKTVPGIKDLPV I53-40A:  SEQ IDACKKLLEEEGCDIVMALGMPGKAEKDKVCAHEASLGLMLAQLMT20, 23, 24, 27, 28, 109, 112, 113,   NO: 3NKHIIEVFVHEDEAKDDAELKILAARRAIEHALNVYYLLFKPEYLIR 116, 120, 124MAGKGLRQGFEDAGPARE I53-40BMSTINNQLKALKVIPVIAIDNAEDIIPLGKVLAENGLPAAEITFRSSAAI53-40B: 47, 51, 54, 58, 74, 102 SEQ IDVKAIMLLRSAQPEMLIGAGTILNGVQALAAKEAGATFVVSPGFNPN NO: 4TVRACQIIGIDIVPGVNNPSTVEAALEMGLTTLKFFPAEASGGISMVKSLVGPYGDIRLMPTGGITPSNIDNYLAIPQVLACGGTWMVDKKLV TNGEWDEIARLTREIVEQVNPI53-47A MPIFTLNTNIKATDVPSDFLSLTSRLVGLILSKPGSYVAVHINTDQQLI53-47A: 22, 25, 29, 72, 79, 86, 87 SEQ IDSFGGSTNPAAFGTLMSIGGIEPSKNRDHSAVLFDHLNAMLGIPKNR NO: 5MYIHFVNLNGDDVGWNGTTF I53-47BMNQHSHKDYETVRIAVVRARWHADIVDACVEAFEIAMAAIGGDRF I53-47B:  SEQ IDAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEFV28, 31, 35, 36, 39, 131, 132, 135,   NO: 6ASAVIDGMMNVQLSTGVPVLSAVLTPHRYRDSAEHHRFFAAHFAV 139, 146KGVEAARACIEILAAREKIAA I53-50AMKMEELFKKHKIVAVLRANSVEEAIEKAVAVFAGGVHLIEITFTVPI53-50A: 25, 29, 33, 54, 57 SEQ IDDADTVIKALSVLKEKGAIIGAGTVTSVEQCRKAVESGAEFIVSPHLD NO: 7EEISQFCKEKGVFYMPGVMTPTELVKAMKLGHTILKLFPGEVVGPQFVKAMKGPFPNVKFVPTGGVNLDNVCEWFKAGVLAVGVGSALVK GTPDEVREKAKAFVEKIRGCTEI53-50B MNQHSHKDYETVRIAVVRARWHAEIVDACVSAFEAAMADIGGDR I53-50B:  SEQ IDFAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEF24, 28, 36, 124, 125, 127, 128, 129,  NO: 8VASAVIDGMMNVQLSTGVPVLSAVLTPHRYRDSDAHTLLFLALFA 131, 132, 133, 135, 139VKGMEAARACVEILAAREKIAA I53-51AMFTKSGDDGNTNVINKRVGKDSPLVNFLGDLDELNSFIGFAISKIPW I53-51 A:  SEQ IDEDMKKDLERVQVELFEIGEDLSTQSSKKKIDESYVLWLLAATAIYRI80, 83, 86, 87, 88, 90, 91, 94, 166,   NO: 9ESGPVKLFVIPGGSEEASVLHVTRSVARRVERNAVKYTKELPEINR 172, 176MIIVYLNRLSSLLFAMALVANKRRNQSEKIYEIGKSW I53-51BMNQHSHKDYETVRIAVVRARWHADIVDQCVRAFEEAMADAGGDR I53-51B:  SEQ IDFAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEF31, 35, 36, 40, 122, 124, 128, 131,  NO: 10VASAVIDGMMNVQLSTGVPVLSAVLTPHRYRSSREHHEFFREHFM 135, 139, 143, 146, 147VKGVEAAAACITILAAREKIAA I52-03AMGHTKGPTPQQHDGSALRIGIVHARWNKTIIMPLLIGTIAKLLECGVI52-03A: 28, 32, 36, 39, 44, 49 SEQ IDKASNIVVQSVPGSWELPIAVQRLYSASQLQTPSSGPSLSAGDLLGSS NO: 11TTDLTALPTTTASSTGPFDALIAIGVLIKGETMHFEYIADSVSHGLMRVQLDTGVPVIFGVLTVLTDDQAKARAGVIEGSHNHGEDWGLAAVE MGVRRRDWAAGKTE I52-03BMYEVDHADVYDLFYLGRGKDYAAEASDIADLVRSRTPEASSLLDVI52-03B: 94, 115, 116, 206, 213 SEQ IDACGTGTHLEHFTKEFGDTAGLELSEDMLTHARKRLPDATLHQGDM NO: 12RDFQLGRKFSAVVSMFSSVGYLKTVAELGAAVASFAEHLEPGGVVVVEPWWFPETFADGWVSADVVRRDGRTVARVSHSVREGNATRMEVHFTVADPGKGVRHFSDVHLITLFHQREYEAAFMAAGLRVEYLEG GPSGRGLFVGVPA I52-32AMGMKEKFVLIITHGDFGKGLLSGAEVIIGKQENVHTVGLNLGDNIE I52-32A:  SEQ IDKVAKEVMRIIIAKLAEDKEIIIVVDLFGGSPFNIALEMMKTFDVKVIT47, 49, 53, 54, 57, 58, 61, 83, 87,  NO: 13GINMPMLVELLTSINVYDTTELLENISKIGKDGIKVIEKSSLKM 88 I52-32BMKYDGSKLRIGILHARWNLEIIAALVAGAIKRLQEFGVKAENIIIETVI52-32B: 19, 20, 23, 30, 40 SEQ IDPGSFELPYGSKLFVEKQKRLGKPLDAIIPIGVLIKGSTMHFEYICDSTT NO: 14HQLMKLNFELGIPVIFGVLTCLTDEQAEARAGLIEGKMHNHGEDW GAAAVEMATKFN I52-33AMAVKGLGEVDQKYDGSKLRIGILHARWNRKIILALVAGAVLRLLEF I52-33A: 33, 41, 44, 50SEQ ID GVKAENIIIETVPGSFELPYGSKLFVEKQKRLGKPLDAIIPIGVLIKGS NO: 15TMHFEYICDSTTHQLMKLNFELGIPVIFGVLTCLTDEQAEARAGLIE GKMHNHGEDWGAAAVEMATKFNI52-33B MGANWYLDNESSRLSFTSTKNADIAEVHRFLVLHGKVDPKGLAEV I52-33B:  SEQ IDEVETESISTGIPLRDMLLRVLVFQVSKFPVAQINAQLDMRPINNLAP61, 63, 66, 67, 72, 147, 148, 154,  NO: 16GAQLELRLPLTVSLRGKSHSYNAELLATRLDERRFQVVTLEPLVIHA 155QDFDMVRAFNALRLVAGLSAVSLSVPVGAVLIFTAR I32-06AMTDYIRDGSAIKALSFAIILAEADLRHIPQDLQRLAVRVIHACGMVDI32-06A: 9, 12, 13, 14, 20, 30, 33,  SEQ IDVANDLAFSEGAGKAGRNALLAGAPILCDARMVAEGITRSRLPADN 34 NO: 17RVIYTLSDPSVPELAKKIGNTRSAAALDLWLPHIEGSIVAIGNAPTALFRLFELLDAGAPKPALIIGMPVGFVGAAESKDELAANSRGVPYVIVR GRRGGSAMTAAAVNALASEREI32-06B MITVFGLKSKLAPRREKLAEVIYSSLHLGLDIPKGKHAIRFLCLEKED I32-06B: SEQ ID FYYPFDRSDDYTVIEINLMAGRSEETKMLLIFLLFIALERKLGIRAHD24, 71, 73, 76, 77, 80, 81, 84, NO: 18 VEITIKEQPAHCWGFRGRTGDSARDLDYDIYV85, 88, 114, 118 I32-19AMGSDLQKLQRFSTCDISDGLLNVYNIPTGGYFPNLTAISPPQNSSIVG I32-19A:  SEQ IDTAYTVLFAPIDDPRPAVNYIDSVPPNSILVLALEPHLQSQFHPFIKITQ208, 213, 218, 222, 225, 226, 229, 23 NO: 19AMYGGLMSTRAQYLKSNGTVVFGRIRDVDEHRTLNHPVFAYGVGS 3CAPKAVVKAVGTNVQLKILTSDGVTQTICPGDYIAGDNNGIVRIPVQETDISKLVTYIEKSIEVDRLVSEAIKNGLPAKAAQTARRMVLKDYI I32-19BMSGMRVYLGADHAGYELKQAIIAFLKMTGHEPIDCGALRYDADDD I32-19B:  SEQ IDYPAFCIAAATRTVADPGSLGIVLGGSGNGEQIAANKVPGARCALAW20, 23, 24, 27, 117, 118, 122, 125 NO: 20SVQTAALAREHNNAQLIGIGGRMHTLEEALRIVKAFVTTPWSKAQR HQRRIDILAEYERTHEAPPVPGAPAI32-28A MGDDARIAAIGDVDELNSQIGVLLAEPLPDDVRAALSAIQHDLFDL I32-28A:  SEQ IDGGELCIPGHAAITEDHLLRLALWLVHYNGQLPPLEEFILPGGARGAA60, 61, 64, 67, 68, 71, 110, 120,  NO: 21LAHVCRTVCRRAERSIKALGASEPLNIAPAAYVNLLSDLLFVLARVL 123, 124, 128NRAAGGADVLWDRTRAH I32-28B MILSAEQSFTLRHPHGQAAALAFVREPAAALAGVQRLRGLDSDGEI32-28B:  SEQ ID QVWGELLVRVPLLGEVDLPFRSEIVRTPQGAELRPLTLTGERAWVA35, 36, 54, 122, 129, 137, 140, 141,  NO: 22VSGQATAAEGGEMAFAFQFQAHLATPEAEGEGGAAFEVMVQAAA 144, 148GVTLLLVAMALPQGLAAGLPPA I53-40A.1MTKKVGIVDTTFARVDMASAAILTLKMESPNIKIIRKTVPGIKDLPV I53-40A:  SEQ IDACKKLLEEEGCDIVMALGMPGKKEKDKVCAHEASLGLMLAQLMT20, 23, 24, 27, 28, 109, 112, 113,   NO: 23NKHIIEVFVHEDEAKDDAELKILAARRAIEHALNVYYLLFKPEYLIR 116, 120, 124MAGKGLRQGFEDAGPARE I53-40B.1MDDINNQLKRLKVIPVIAIDNAEDIIPLGKVLAENGLPAAEITFRSSAI53-40B: 47, 51, 54, 58, 74, 102 SEQ IDAVKAIMLLRSAQPEMLIGAGTILNGVQALAAKEAGADFVVSPGFNP NO: 24NTVRACQIIGIDIVPGVNNPSTVEQALEMGLTTLKFFPAEASGGISMVKSLVGPYGDIRLMPTGGITPDNIDNYLAIPQVLACGGTWMVDKKL VRNGEWDEIARLTREIVEQVNPI53-47A.1 MPIFTLNTNIKADDVPSDFLSLTSRLVGLILSKPGSYVAVHINTDQQLI53-47A: 22, 25, 29, 72, 79, 86, 87 SEQ IDSFGGSTNPAAFGTLMSIGGIEPDKNRDHSAVLFDHLNAMLGIPKNR NO: 25MYIHFVNLNGDDVGWNGTTF I53-MPIFTLNTNIKADDVPSDFLSLTSRLVGLILSEPGSYVAVHINIDQQLI53-47A: 22, 25, 29, 72, 79, 86, 87 47A.1NegTSFGGSTNPAAFGTLMSIGGIEPDKNEDHSAVLFDHLNAMLGIPKNR 2 MYIHFVDLDGDDVGWNGTTFSEQ ID NO: 26 I53-47B.1 MNQHSHKDHETVRIAVVRARWHADIVDACVEAFEIAMAAIGGDRFI53-47B:  SEQ ID AVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEFV28, 31, 35, 36, 39, 131, 132,   NO: 27ASAVIDGMMNVQLDTGVPVLSAVLTPHRYRDSDEHHRFFAAHFAV 135, 139, 146KGVEAARACIEILNAREKIAA I53- MNQHSHKDHETVRIAVVRARWHADIVDACVEAFEIAMAAIGGDRFI53-47B:  47B.1NegT AVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVDGGIYDHEFV28, 31, 35, 36, 39, 131, 132,   2ASAVIDGMMNVQLDTGVPVLSAVLTPHEYEDSDEDHEFFAAHFAV 135, 139, 146 SEQ IDKGVEAARACIEILNAREKIAA NO: 28 I53-50A.1MKMEELFKKHKIVAVLRANSVEEAIEKAVAVFAGGVHLIEITFTVPI53-50A: 25, 29, 33, 54, 57 SEQ IDDADTVIKALSVLKEKGAIIGAGTVTSVEQCRKAVESGAEFIVSPHLD NO: 29EEISQFCKEKGVFYMPGVMTPTELVKAMKLGHDILKLFPGEVVGPQFVKAMKGPFPNVKFVPTGGVNLDNVCEWFKAGVLAVGVGDALVK GDPDEVREKAKKFVEKIRGCTE I53-MKMEELFKKHKIVAVLRANSVEEAIEKAVAVFAGGVHLIEITFTVPI53-50A: 25, 29, 33, 54, 57 50A.1NegTDADTVIKALSVLKEKGAIIGAGTVTSVEQCRKAVESGAEFIVSPHLD 2EEISQFCKEKGVFYMPGVMTPTELVKAMKLGHDILKLFPGEVVGPE SEQ IDFVEAMKGPFPNVKFVPTGGVDLDDVCEWFDAGVLAVGVGDALVE NO: 30GDPDEVREDAKEFVEEIRGCTE I53-MKMEELFKKHKIVAVLRANSVEEAIEKAVAVFAGGVHLIEITFTVPI53-50A: 25, 29, 33, 54, 57 50A.lPosTDADTVIKALSVLKEKGAIIGAGTVTSVEQCRKAVESGAEFIVSPHLD 1EEISQFCKEKGVFYMPGVMTPTELVKAMKLGHDILKLFPGEVVGPQ SEQ IDFVKAMKGPFPNVKFVPTGGVNLDNVCKWFKAGVLAVGVGKALV NO: 31KGKPDEVREKAKKFVKKIRGCTE I53-50B.1MNQHSHKDHETVRIAVVRARWHAEIVDACVSAFEAAMRDIGGDRF I53-50B:  SEQ IDAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEFV24, 28, 36, 124, 125, 127, 128,  NO: 32ASAVIDGMMNVQLDTGVPVLSAVLTPHRYRDSDAHTLLFLALFAV129, 131, 132, 133, 135, 139 KGMEAARACVEILAAREKIAA I53-MNQHSHKDHETVRIAVVRARWHAEIVDACVSAFEAAMRDIGGDRF I53-50B:  50B.1NegTAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVDGGIYDHEFV24, 28, 36, 124, 125, 127, 128,  2ASAVIDGMMNVQLDTGVPVLSAVLTPHEYEDSDADTLLFLALFAV129, 131, 132, 133, 135, 139 SEQ ID KGMEAARACVEILAAREKIAA NO: 33 I53-MNQHSHKDHETVRIAVVRARWHAEIVDACVSAFEAAMRDIGGDRF I53-50B:  50B.4PosTAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVVNGGIYRHEFV24, 28, 36, 124, 125, 127, 128,  1ASAVINGMMNVQLNTGVPVLSAVLTPHNYDKSKAHTLLFLALFAV129, 131, 132, 133, 135, 139 SEQ ID KGMEAARACVEILAAREKIAA NO: 34

I53-40 A genus (SEQ ID NO: 35)MTKKVGIVDTTFARVDMASAAILTLKMESPNIKIIRKTVPGIKDLPVACKKLLEEEGCDIVMALGMPGK(A/K)EKDKVCAHEASLGLMLAQLMTNKHIIEVFVHEDEAKDDAELKILAARRAIEHALNVYYLLFKPEYLTRMAGKGLRQGFEDAGPARE I53-40 B genus (SEQ ID NO: 36)M(S/D)(T/D)INNQLK(A/R)LKVIPVIAIDNAEDIIPLGKVLAENGLPAAEITFRSSAAVKAIMLLRSAQPEMLIGAGTILNGVQALAAKEAGA(T/D)FVVSPGFNPNTVRACQIIGIDIVPGVNNPSTVE(A/Q)ALEMGLTTLKFFPAEASGGISMVKSLVGPYGDIRLMPTGGITP(S/D)NIDNYLAIPQVLACGGTWMVDKKLV (T/R)NGEWDEIARLTREIVEQVNPI53-47A genus (SEQ ID NO: 37)MPIFTLNTNIKA(T/D)DVPSDFLSLTSRLVGLILS(K/E)PGSYVAVHINTDQQLSFGGSTNPAAFGTLMSIGGIEP(S/D)KN(R/E)DHSAVLFDHLNAMLGIPKNRMYIHFV(N/D)L(N/D)G DDVGWNGTTF I53-47B genus(SEQ ID NO: 38) MNQHSHKD(Y/H)ETVRIAVVRARWHADIVDACVEAFEIAMAAIGGDRFAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVV(N/D)GGIY(R/D)HEFVASAVIDGMMNVQL(S/D)TGVPVLSAVLTPH(R/E)Y(R/E)DS(A/D)E(H/D)H(R/E)FFAAH FAVKGVEAARACIEIL(A/N)AREKIAAI53-50A genus (SEQ ID NO: 39) MKMEELFKKHKIVAVLRANSVEEAIEKAVAVFAGGVHLIEITFTVPDADTVIKALSVLKEKGAIIGAGTVTSVEQCRKAVESGAEFIVSPHLDEEISQFCKEKGVFYMPGVMTPTELVKAMKLGH(T/D)ILKLFPGEVVGP(Q/E)FV(K/E)AMKGPFPNVKFVPTGGV(N/D)LD(N/D)VC(E/K)WF(K/D)AGVLAVGVG(S/K/D)ALV(K/E)G(T/D/K)PDEVRE(K/D)AK(A/E/K) FV(E/K)(K/E)IRGCTEI53-50B genus (SEQ ID NO: 40) MNQHSHKD(Y/H)ETVRIAVVRARWHAEIVDACVSAFEAAM(A/R)DIGGDRFAVDVFDVPGAYEIPLHARTLAETGRYGAVLGTAFVV(N/D)GGIY(R/D)HEFVASAVI(D/N)GMMNVQL(S/D/N)TGVPVLSAVLTPH(R/E/N)Y(R/D/E)(D/K)S(D/K)A(H/D)TLLFLALFAVKGMEAARACVEILAAREKIAA

As is the case with proteins in general, the polypeptides are expectedto tolerate some variation in the designed sequences without disruptingsubsequent assembly into nanostructures: particularly when suchvariation comprises conservative amino acid substitutions. As used here,“conservative amino acid substitution” means that: hydrophobic aminoacids (Ala, Cys, Gly, Pro, Met, See, Sme, Val, Ile, Leu) can only besubstituted with other hydrophobic amino acids; hydrophobic amino acidswith bulky side chains (Phe, Tyr, Trp) can only be substituted withother hydrophobic amino acids with bulky side chains; amino acids withpositively charged side chains (Arg, His, Lys) can only be substitutedwith other amino acids with positively charged side chains; amino acidswith negatively charged side chains (Asp, Glu) can only be substitutedwith other amino acids with negatively charged side chains; and aminoacids with polar uncharged side chains (Ser, Thr, Asn, Gln) can only besubstituted with other amino acids with polar uncharged side chains.

As will be apparent to those of skill in the art, the ability to widelymodify surface amino acid residues without disruption of the polypeptidestructure permits many types of modifications to endow the resultingself-assembled nanostructures with a variety of functions. In onenon-limiting embodiment, the polypeptides of the invention can bemodified to facilitate covalent linkage to a “cargo” of interest. In onenon-limiting example, the polypeptides can be modified, such as byintroduction of various cysteine residues at defined positions tofacilitate linkage to one or more antigens of interest, such that ananostructure of the polypeptides would provide a scaffold to provide alarge number of antigens for delivery as a vaccine to generate animproved immune response. In some embodiments, some or all nativecysteine residues that are present in the polypeptides but not intendedto be used for conjugation may be mutated to other amino acids tofacilitate conjugation at defined positions. In another non-limitingembodiment, the polypeptides of the invention may be modified by linkage(covalent or non-covalent) with a moiety to help facilitate “endosomalescape.” For applications that involve delivering molecules of interestto a target cell, such as targeted delivery, a critical step can beescape from the endosome—a membrane-bound organelle that is the entrypoint of the delivery vehicle into the cell. Endosomes mature intolysosomes, which degrade their contents. Thus, if the delivery vehicledoes not somehow “escape” from the endosome before it becomes alysosome, it will be degraded and will not perform its function. Thereare a variety of lipids or organic polymers that disrupt the endosomeand allow escape into the cytosol. Thus, in this embodiment, thepolypeptides can be modified, for example, by introducing cysteineresidues that will allow chemical conjugation of such a lipid or organicpolymer to the monomer or resulting assemly surface. In anothernon-limiting example, the polypeptides can be modified, for example, byintroducing cysteine residues that will allow chemical conjugation offluorophores or other imaging agents that allow visualization of thenanostructures of the invention in vitro or in vivo.

Surface amino acid residues on the polypeptides can be mutated in orderto improve the stability or solubility of the protein subunits or theassembled nanostructures. As will be known to one of skill in the art,if the polypeptide has significant sequence homology to an existingprotein family, a multiple sequence alignment of other proteins fromthat family can be used to guide the selection of amino acid mutationsat non-conserved positions that can increase protein stability and/orsolubility, a process referred to as consensus protein design (9).

Surface amino acid residues on the polypeptides can be mutated topositively charged (Arg, Lys) or negatively charged (Asp, Glu) aminoacids in order to endow the protein surface with an overall positive oroverall negative charge. In one non-limiting embodiment, surface aminoacid residues on the polypeptides can be mutated to endow the interiorsurface of the self-assembling nanostructure with a high net charge.Such a nanostructure can then be used to package or encapsulate a cargomolecule with the opposite net charge due to the electrostaticinteraction between the nanostructure interior surface and the cargomolecule. In one non-limiting embodiment, surface amino acid residues onthe polypeptides can be mutated primarily to Arginine or Lysine residuesin order to endow the interior surface of the self-assemblingnanostructure with a net positive charge. Solutions containing thepolypeptides can then be mixed in the presence of a nucleic acid cargomolecule such as a dsDNA, ssDNA, dsRNA, ssRNA, cDNA, miRNA, siRNA,shRNA, piRNA, or other nucleic acid in order to encapsulate the nucleicacid inside the self-assembling nanostructure. Such a nanostructurecould be used, for example, to protect, deliver, or concentrate nucleicacids.

Table 2 lists surface amino acid residue numbers for each exemplarypolypeptide of the invention denoted by SEQ ID NOS: 1-34. Thus, invarious embodiments, 1 or more (at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,47, 48, 49, 50, or more) of these surface residues may be modified inthe polypeptides of the invention.

TABLE 2 Amino Acid Surface residues Name Sequence no tnearinterfaceI53-34A MEGMDPLAVLAESRLLPLLT I53-34A: SEQ ID VRGGEDLAGLATVLELMGVG6, 8, 9, 12, 14, 22, 25, 48, 49, 50, NO: 1 ALEITLRTEKGLEALKALRK52, 53, 56, 73, 74, 81, 94, 95, 101, SGLLLGAGTVRSPKEAEAAL102, 103, 104, 119, 122, 137, 140, EAGAAFLVSPGLLEEVAALA143, 147, 150, 151, 153, 161, 162, QARGVPYLPGVLTPIEVERA163, 164, 166, 167, 170, 172, 184, LALGLSALKFFPAEPFQGVR193, 198, 199, 200, 202 VLRAYAEVFPEVRFLPTGGI KEEHLPHYAALPNLLAVGGSWLLQGDLAAVMKKVKAAKAL LSPQAPG I53-34B MTKKVGIVDTTFARVDMAEA I53-34B:SEQ ID AIRTLKALSPNIKIIRKTVP 3, 12, 31, 33, 35, 36, 51, 54, 55, NO: 2GIKDLPVACKKLLEEEGCDI 56, 59, 69, 70, 71, 74, 93, 103,VMALGMPGKAEKDKVCAHEA 106, 107, 108, 131, 132, 133, 134,SLGLMLAQLMTNKHIIEVFV 138, 142, 153 HEDEAKDDDELDILALVRAIEHAANVYYLLFKPEYLTRMA GKGLRQGREDAGPARE I53-40A MTKKVGIVDTTFARVDMASAI53-40A: SEQ ID AILTLKMESPNIKIIRKTVP 3, 4, 31, 33, 35, 36, 37, 51, 54,NO: 3 GIKDLPVACKKLLEEEGCDI 55, 56, 57, 59, 69, 70, 71, 74,VMALGMPGKAEKDKVCAHEA 93, 103, 106, 118, 127, 128, 131,SLGLMLAQLMTNKHIIEVFV 132, 133, 134, 135, 138, 139, 142,HEDEAKDDAELKILAARRAI 150, 153 EHALNVYYLLFKPEYLIRMA GKGLRQGFEDAGPAREI53-40B MSTINNQLKALKVIPVIAID I53-40B: SEQ ID NAEDIIPLGKVLAENGLPAA2, 3, 7, 9, 10, 12, 20, 21, 23, 26, NO: 4 EITFRSSAAVKAIMLLRSAQ27, 30, 34, 38, 45, 60, 62, 75, 85, PEMLIGAGTILNGVQALAAK94, 95, 122, 124, 126, 134, 139, 143, EAGATFVVSPGFNPNTVRAC151, 153, 161, 163, 166, 167, 170, QIIGIDIVPGVNNPSTVEAA172, 180, 184, 185, 186, 189, 190, LEMGLTTLKFFPAEASGGIS192, 193, 194, 195, 198, 201, 202, MVKSLVGPYGDIRLMPTGGI 205, 208, 209TPSNIDNYLAIPQVLACGGT WMVDKKLVTNGEWDEIARLT REIVEQVNP I53-47AMPIFTLNTNIKATDVPSDFL I53-47A: SEQ ID SLTSRLVGLILSKPGSYVAV11, 13, 14, 17, 34, 36, 37, 45, 47, NO: 5 HINTDQQLSFGGSTNPAAFG54, 55, 56, 65, 69, 70, 71, 74, 91, TLMSIGGIEPSKNRDHSAVL92, 93, 101, 103, 105, 109, 110, 112, FDHLNAMLGIPKNRMYIHFV 114NLNGDDVGWNGTTF I53-47B MNQHSHKDYETVRIAVVRAR I53-47B: SEQ IDWHADIVDACVEAFEIAMAAI 6, 7, 8, 9, 10, 11, 13, 18, 20, 21, NO: 6GGDRFAVDVFDVPGAYEIPL 24, 43, 44, 51, 63, 67, 70, 85, 87,HARTLAETGRYGAVLGTAFV 101, 105, 122, 123, 124, 125, 126,VNGGIYRHEFVASAVIDGMM 147, 152, 153, 154 NVQLSTGVPVLSAVLTPHRYRDSAEHHRFFAAHFAVKGVE AARACIEILAAREKIAA I53-50A MKMEELFKKHKIVAVLRANSI53-50A: SEQ ID VEEAIEKAVAVFAGGVHLIE4, 5, 6, 8, 9, 11, 17, 19, 23, 37, 46, NO: 7 ITFTVPDADTVIKALSVLKE47, 59, 74, 77, 78, 81, 94, 95, 98, KGAIIGAGTVTSVEQCRKAV101, 102, 103, 106, 119, 122, 126, ESGAEFIVSPHLDEEISQFC139, 142, 145, 149, 150, 152, 160,  KEKGVFYMPGVMTPTELVKA161, 162, 163, 166, 169, 179, 183, MKLGHTILKLFPGEVVGPQF185, 188, 191, 192, 194, 198, 199 VKAMKGPFPNVKFVPTGGVNLDNVCEWFKAGVLAVGVGSA LVKGTPDEVREKAKAFVEKI RGCTE I53-50BMNQHSHKDYETVRIAVVRAR I53-50B: SEQ ID WHAEIVDACVSAFEAAMADI6, 7, 8, 9, 10, 11, 13, 18, 20, 21, NO: 8 GGDRFAVDVFDVPGAYEIPL34, 38, 39, 40, 43, 44, 48, 51, 63, HARTLAETGRYGAVLGTAFV67, 70, 87, 101, 105, 118, 143, 147, VNGGIYRHEFVASAVIDGMM 152, 153, 154NVQLSTGVPVLSAVLTPHRY RDSDAHTLLFLALFAVKGME AARACVEILAAREKIAA I53-51AMFTKSGDDGNTNVINKRVGK I53-51A: SEQ ID DSPLVNFLGDLDELNSFIGF19, 20, 24, 28, 46, 47, 51, 70, 71, 73,  NO: 9 AISKIPWEDMKKDLERVQVE74, 75, 76, 102, 122, 130, 133, 134, 135,  LFEIGEDLSTQSSKKKIDES136, 137, 140, 162, 163, 164, 165,  YVLWLLAATAIYRIESGPVK 169, 175, 177LFVIPGGSEEASVLHVTRSV ARRVERNAVKYTKELPEINR MIIVYLNRLSSLLFAMALVANKRRNQSEKIYEIGKSW I53-51B MNQHSHKDYETVRIAVVRAR I53-51B: SEQ IDWHADIVDQCVRAFEEAMADA 6, 7, 8, 9, 10, 11, 13, 18, 21, 27, 34, NO: 10GGDRFAVDVFDVPGAYEIPL 38, 43, 48, 63, 67, 70, 85, 87, 101,HARTLAETGRYGAVLGTAFV 118, 125, 126, 129, 152, 153, 154VNGGIYRHEFVASAVIDGMM NVQLSTGVPVLSAVLTPHRY RSSREHHEFFREHFMVKGVEAAAACITILAAREKIAA I52-03A MGHTKGPTPQQHDGSALRIG I52-03A: SEQ IDIVHARWNKTIIMPLLIGTIA 6, 9, 10, 11, 13, 15, 16, 26, 48, 69, NO: 11KLLECGVKASNIVVQSVPGS 75, 76, 78, 79, 111, 125, 127, 142,WELPIAVQRLYSASQLQTPS 146, 159, 160, 161, 162, 171, 175,SGPSLSAGDLLGSSTTDLTA 193, 194, 196, 197, 199, 200 LPTTTASSTGPFDALIAIGVLIKGETMHFEYIADSVSHGL MRVQLDTGVPVIFGVLTVLI DDQAKARAGVIEGSHNHGEDWGLAAVEMGVRRRDWAAGKT E I52-03B MYEVDHADVYDLFYLGRGKD I52-03B: SEQ IDYAAEASDIADLVRSRTPEAS 2, 3, 5, 6, 8, 15, 17, 20, 22, 23, 26, NO: 12SLLDVACGTGTHLEHFTKEF 27, 30, 33, 34, 35, 37, 38, 40, 54, 55,GDTAGLELSEDMLTHARKRL 57, 58, 59, 61, 62, 68, 70, 71, 74, 77,PDATLHQGDMRDFQLGRKFS 78, 79, 81, 82, 84, 86, 87, 91, 96, 97,AVVSMFSSVGYLKTVAELGA 98, 111, 127, 130, 131, 132, 141, 144,AVASFAEHLEPGGVVVVEPW 145, 148, 150, 154, 157, 158, 159, 160,WFPETFADGWVSADVVRRDG 161, 171, 172, 173, 174, 177, 187, 189,RTVARVSHSVREGNATRMEV 192, 198, 199, 222, 223, 224, 236HFTVADPGKGVRHFSDVHLI TLFHQREYEAAFMAAGLRVE YLEGGPSGRGLFVGVPA I52-32AMGMKEKFVLIITHGDFGKGL I52-32A: SEQ ID LSGAEVIIGKQENVHTVGLN3, 5, 15, 18, 30, 32, 35, 40, 41, 42, NO: 13 LGDNIEKVAKEVMRIIIAKL44, 45, 65, 73, 79, 91, 103, 106, 109, AEDKEIIIVVDLFGGSPFNI110, 111, 112, 114, 115, 118, 122, 123, ALEMMKTFDVKVITGINMPM125, 126, 129, 131 LVELLTSINVYDTTELLENI SKIGKDGIKVIEKSSLKM I52-32BMKYDGSKLRIGILHARWNLE I52-32B: SEQ ID IIAALVAGAIKRLQEFGVKA4, 6, 7, 9, 17, 32, 35, 42, 59, 63, 64, NO: 14 ENIIIETVPGSFELPYGSKL66, 67, 68, 69, 70, 71, 73, 83, 85, 90, FVEKQKRLGKPLDAIIPIGV106, 119, 120, 121, 122, 125, 131, 133, LIKGSTMHFEYICDSTTHQL134, 135, 136, 154 MKLNFELGIPVIFGVLTCLT DEQAEARAGLIEGKMHNHGEDWGAAAVEMATKFN I52-33A MAVKGLGEVDQKYDGSKLRI I52-33A: SEQ IDGILHARWNRKIILALVAGAV 12, 14, 16, 17, 19, 26, 27, 46, 69, 73, NO: 15LRLLEFGVKAENIIIETVPG 74, 76, 77, 78, 80, 81, 83, 93, 95, 100,SFELPYGSKLFVEKQKRLGK 116, 129, 130, 131, 132, 145, 164PLDAIIPIGVLIKGSTMHFE YICDSTTHQLMKLNFELGIP VIFGVLTCLTDEQAEARAGLIEGKMHNHGEDWGAAAVEMA TKFN I52-33B MGANWYLDNESSRLSFTSTK I52-33B: SEQ IDNADIAEVHRFLVLHGKVDPK 4, 6, 10, 20, 21, 23, 24, 31, 32, 34, 36, NO: 16GLAEVEVETESISTGIPLRD 39, 40, 42, 44, 46, 48, 56, 73, 77, 79,MLLRVLVFQVSKFPVAQINA 81, 83, 85, 88, 89, 91, 92, 96, 97, 99,QLDMRPINNLAPGAQLELRL 101, 103, 109, 110, 111, 112, 114, 124,PLTVSLRGKSHSYNAELLAT 125, 138, 140, 143, 158, 175 RLDERRFQVVTLEPLVIHAQDFDMVRAFNALRLVAGLSAV SLSVPVGAVLIFTAR I32-06A MTDYIRDGSAIKALSFAIILI32-06A: SEQ ID AEADLRHIPQDLQRLAVRVI24, 26, 27, 41, 47, 50, 51, 56, 60, 63, NO: 17 HACGMVDVANDLAFSEGAGK64, 67, 68, 77, 84, 85, 86, 91, 93, 98, AGRNALLAGAPILCDARMVA99, 100, 101, 102, 105, 108, 109, 114, EGITRSRLPADNRVIYTLSD123, 124, 125, 127, 135, 142, 145, 148, PSVPELAKKIGNTRSAAALD149, 152, 153, 169, 172, 173, 176, 177, LWLPHIEGSIVAIGNAPTAL180, 187, 189 FRLFELLDAGAPKPALIIGM PVGFVGAAESKDELAANSRGVPYVIVRGRRGGSAMTAAAV NALASERE I32-06B MITVFGLKSKLAPRREKLAE I32-06B:SEQ ID VIYSSLHLGLDIPKGKHAIR 8, 9, 10, 13, 14, 15, 16, 17, 20, 34, 36,NO: 18 FLCLEKEDFYYPFDRSDDYT 45, 46, 47, 50, 51, 53, 54, 57, 67, 70,VIEINLMAGRSEETKMLLIF 91, 93, 95, 105, 112 LLFIALERKLGIRAHDVEITIKEQPAHCWGFRGRTGDSAR DLDYDIYV I32-19A MGSDLQKLQRFSTCDISDGL I32-19A:SEQ ID LNVYNIPTGGYFPNLTAISP 3, 4, 6, 7, 9, 10, 25, 27, 36, 40, 42,NO: 19 PQNSSIVGTAYTVLFAPIDD 43, 44, 49, 58, 59, 61, 62, 63, 70, 72,PRPAVNYIDSVPPNSILVLA 73, 74, 82, 84, 88, 89, 109, 110, 112,LEPHLQSQFHPFIKITQAMY 126, 127, 129, 130, 132, 146, 155, 156,GGLMSTRAQYLKSNGTVVFG 157, 159, 166, 169, 172, 189, 190, 192,RIRDVDEHRTLNHPVFAYGV 194, 195, 198, 201, 204, 215, 232GSCAPKAVVKAVGTNVQLKI LTSDGVTQTICPGDYIAGDN NGIVRIPVQETDISKLVTYIEKSIEVDRLVSEAIKNGLPA KAAQTARRMVLKDYI I32-19B MSGMRVYLGADHAGYELKQAI32-19B: SEQ ID IIAFLKMTGHEPIDCGALRY4, 5, 31, 33, 38, 41, 42, 43, 55, 56, 59, NO: 20 DADDDYPAFCIAAATRTVAD61, 62, 83, 93, 94, 101, 104, 113, 119, PGSLGIVLGGSGNGEQIAAN129, 131, 134, 136, 137, 139, 140, 143, KVPGARCALAWSVQTAALAR144, 146, 147, 150, 152, 153, 156, 158, EHNNAQLIGIGGRMHTLEEA 159LRIVKAFVTTPWSKAQRHQR RIDILAEYERTHEAPPVPGA PA I32-28AMGDDARIAAIGDVDELNSQI I32-28A: SEQ ID GVLLAEPLPDDVRAALSAIQ4, 6, 7, 10, 14, 27, 30, 31, 33, 34, 41, NO: 21 HDLFDLGGELCIPGHAAITE44, 45, 51, 52, 53, 54, 55, 56, 59, 76, DHLLRLALWLVHYNGQLPPL78, 79, 80, 81, 82, 83, 90, 103, 111, EEFILPGGARGAALAHVCRT115, 116, 131, 134, 142, 145, 147, 150 VCRRAERSIKALGASEPLNIAPAAYVNLLSDLLFVLARVL NRAAGGADVLWDRTRAH I32-28B MILSAEQSFTLRHPHGQAAAI32-28B: SEQ ID LAFVREPAAALAGVQRLRGL3, 4, 6, 8, 12, 15, 17, 18, 22, 26, 28, NO: 22 DSDGEQVWGELLVRVPLLGE32, 38, 39, 41, 43, 45, 46, 48, 50, 60, VDLPFRSEIVRTPQGAELRP66, 68, 71, 73, 74, 79, 81, 82, 83, 84, LTLTGERAWVAVSGQATAAE86, 87, 95, 100, 103, 105, 109, 111, 113, GGEMAFAFQFQAHLATPEAE151, 152, 155, 156, 157 GEGGAAFEVMVQAAAGVTLL LVAMALPQGLAAGLPPA I53-40A.1MTKKVGIVDTTFARVDMASA I53-40A: SEQ ID AILTLKMESPNIKIIRKTVP3, 4, 31, 33, 35, 36, 37, 51, 54, 55, 56, NO: 23 GIKDLPVACKKLLEEEGCDI57, 59, 69, 70, 71, 74, 93, 103, 106, 118, VMALGMPGKKEKDKVCAHEA127, 128, 131, 132, 133, 134, 135, 138, SLGLMLAQLMTNKHIIEVFV139, 142, 150, 153 HEDEAKDDAELKILAARRAI EHALNVYYLLFKPEYLIRMAGKGLRQGFEDAGPARE I53-40B.1 MDDINNQLKRLKVIPVIAID I53-40B: SEQ IDNAEDIIPLGKVLAENGLPAA 2, 3, 7, 9, 10, 12, 20, 21, 23, 26, 27, NO: 24EITFRSSAAVKAIMLLRSAQ 30, 34, 38, 45, 60, 62, 75, 85, 94, 95,PEMLIGAGTILNGVQALAAK 122, 124, 126, 134, 139, 143, 151, 153,EAGADFVVSPGFNPNTVRAC 161, 163, 166, 167, 170, 172, 180, 184,QIIGIDIVPGVNNPSTVEQA 185, 186, 189, 190, 192, 193, 194, 195,LEMGLTTLKFFPAEASGGIS 198, 201, 202, 205, 208, 209 MVKSLVGPYGDIRLMPTGGITPDNIDNYLAIPQVLACGGT WMVDKKLVRNGEWDEIARLT REIVEQVNP I53-47A.1MPIFTLNTNIKADDVPSDFL I53-47A: SEQ ID SLTSRLVGLILSKPGSYVAV11, 13, 14, 17, 34, 36, 37, 45, 47, 54, NO: 25 HINTDQQLSFGGSTNPAAFG55, 56, 65, 69, 70, 71, 74, 91, 92, 93, TLMSIGGIEPDKNRDHSAVL101, 103, 105, 109, 110, 112, 114 FDHLNAMLGIPKNRMYIHFV NLNGDDVGWNGTTFI53- MPIFTLNTNIKADDVPSDFL I53-47A: 47A.1NegT SLTSRLVGLILSEPGSYVAV11, 13, 14, 17, 34, 36, 37, 45, 47, 54, 2 HINIDQQLSFGGSTNPAAFG55, 56, 65, 69, 70, 71, 74, 91, 92, 93, SEQ ID TLMSIGGIEPDKNEDHSAVL101, 103, 105, 109, 110, 112, 114 NO: 26 FDHLNAMLGIPKNRMYIHFVDLDGDDVGWNGTTF I53-47B.1 MNQHSHKDHETVRIAVVRAR I53-47B: SEQ IDWHADIVDACVEAFEIAMAAI 6, 7, 8, 9, 10, 11, 13, 18, 20, 21, 24, NO: 27GGDRFAVDVFDVPGAYEIPL 43, 44, 51, 63, 67, 70, 85, 87, 101,HARTLAETGRYGAVLGTAFV 105, 122, 123, 124, 125, 126, 147, 152,VNGGIYRHEFVASAVIDGMM 153, 154 NVQLDTGVPVLSAVLTPHRY RDSDEHHRFFAAHFAVKGVEAARACIEILNAREKIAA I53- MNQHSHKDHETVRIAVVRAR I53-47B: 47B.1NegTWHADIVDACVEAFEIAMAAI 6, 7, 8, 9, 10, 11, 13, 18, 20, 21, 24, 2GGDRFAVDVFDVPGAYEIPL 43, 44, 51, 63, 67, 70, 85, 87, 101, SEQ IDHARTLAETGRYGAVLGTAFV 105, 122, 123, 124, 125, 126, 147, 152, NO: 28VDGGIYDHEFVASAVIDGMM 153, 154 NVQLDTGVPVLSAVLTPHEY EDSDEDHEFFAAHFAVKGVEAARACIEILNAREKIAA I53-50A.1 MKMEELFKKHKIVAVLRANS I53-50A: SEQ IDVEEAIEKAVAVFAGGVHLIE 4, 5, 6, 8, 9, 11, 17, 19, 23, 37, 46, NO: 29ITFTVPDADTVIKALSVLKE 47, 59, 74, 77, 78, 81, 94, 95, 98, 101,KGAIIGAGTVTSVEQCRKAV 102, 1013, 106, 119, 122, 126, 139, 142,ESGAEFIVSPHLDEEISQFC 145, 49, 150, 152, 160, 161, 162, 163,KEKGVFYMPGVMTPTELVKA 166, 169, 179, 183, 185, 188, 191, 192,MKLGHDILKLFPGEVVGPQF 194, 198, 199 VKAMKGPFPNVKFVPTGGVNLDNVCEWFKAGVLAVGVGDA LVKGDPDEVREKAKKFVEKI RGCTE I53-MKMEELFKKHKIVAVLRANS I53-50A: 50A.1NegT VEEAIEKAVAVFAGGVHLIE4, 5, 6, 8, 9, 11, 17, 19, 23, 37, 46, 2 ITFTVPDADTVIKALSVLKE47, 59, 74, 77, 78, 81, 94, 95, 98, 101, SEQ ID KGAIIGAGTVTSVEQCRKAV102, 103, 106, 119, 122, 126, 139, 142, NO: 30 ESGAEFIVSPHLDEEISQFC145, 149, 150, 152, 160, 161, 162, 163, KEKGVFYMPGVMTPTELVKA166, 169, 179, 183, 185, 188, 191, 192, MKLGHDILKLFPGEVVGPEF194, 198, 199 VEAMKGPFPNVKFVPTGGVD LDDVCEWFDAGVLAVGVGDALVEGDPDEVREDAKEFVEEI RGCTE I53- MKMEELFKKHKIVAVLRANS I53-50A: 50A.lPosTVEEAIEKAVAVFAGGVHLIE 4, 5, 6, 8, 9, 11, 17, 19, 23, 37, 46, 1ITFTVPDADTVIKALSVLKE 47, 59, 74, 77, 78, 81, 94, 95, 98, 101, SEQ IDKGAIIGAGTVTSVEQCRKAV 102, 103, 106, 119, 122, 126, 139, 142, NO: 31ESGAEFIVSPHLDEEISQFC 145, 419, 150, 152, 160, 161, 162, 163,KEKGVFYMPGVMTPTELVKA 166, 169, 179, 183, 185, 188, 191, 192,MKLGHDILKLFPGEVVGPQF 194, 198, 199 VKAMKGPFPNVKFVPTGGVNLDNVCKWFKAGVLAVGVGKA LVKGKPDEVREKAKKFVKKI RGCTE I53-50B.1MNQHSHKDHETVRIAVVRAR I53-50B: SEQ ID WHAEIVDACVSAFEAAMRDI6, 7, 8, 9, 10, 11, 13, 18, 20, 21, 34, NO: 32 GGDRFAVDVFDVPGAYEIPL38, 93, 40, 43, 44, 48, 51, 63, 67, 70, HARTLAETGRYGAVLGTAFV87, 101, 105, 118, 143, 147, 152, 153, VNGGIYRHEFVASAVIDGMM 154NVQLDTGVPVLSAVLTPHRY RDSDAHTLLFLALFAVKGME AARACVEILAAREKIAA I53-MNQHSHKDHETVRIAVVRAR I53-50B: 50B.1NegT WHAEIVDACVSAFEAAMRDI6, 7, 8, 9, 10, 11, 13, 18, 20, 21, 34, 2 GGDRFAVDVFDVPGAYEIPL38, 39, 40, 43, 44, 48, 51, 63, 67, 70, SEQ ID HARTLAETGRYGAVLGTAFV87, 101, 105, 118, 143, 147, 152, 153, NO: 33 VDGGIYDHEFVASAVIDGMM 154NVQLDTGVPVLSAVLTPHEY EDSDADTLLFLALFAVKGME AARACVEILAAREKIAA I53-MNQHSHKDHETVRIAVVRAR I53-50B: 50B.4PosT WHAEIVDACVSAFEAAMRDI6, 7, 8, 9, 10, 11, 13, 18, 20, 21, 34, 1 GGDRFAVDVFDVPGAYEIPL38, 39, 40, 43, 44, 48, 51, 63, 67, 70, SEQ ID HARTLAETGRYGAVLGTAFV87, 101, 105, 118, 143, 147, 152, 153, NO: 34 VNGGIYRHEFVASAVINGMM 154NVQLNTGVPVLSAVLTPHNY DKSKAHTLLFLALFAVKGME AARACVEILAAREKIAA

In certain instances, the polypeptides of the present invention can alsotolerate non-conservative substitutions. The isolated polypeptides maybe produced recombinantly or synthetically, using standard techniques inthe art. The isolated polypeptides of the invention can be modified in anumber of ways, including but not limited to the ways described above,either before or after assembly of the nanostructures of the invention.As used throughout the present application, the term “polypeptide” isused in its broadest sense to refer to a sequence of subunit aminoacids. The polypeptides of the invention may comprise L-amino acids,D-amino acids (which are resistant to L-amino acid-specific proteases invivo), or a combination of D- and L-amino acids.

In another aspect, the invention provides nanostructures, comprising:

(a) a plurality of first assemblies, each first assembly comprising aplurality of identical first polypeptides, wherein the firstpolypeptides comprise the polypeptide of any embodiment or combinationof embodiments of the first aspect of the invention; and

(b) a plurality of second assemblies, each second assembly comprising aplurality of identical second polypeptides, wherein the secondpolypeptides comprise the polypeptide of any embodiment or combinationof embodiments of the first aspect of the invention, wherein the secondpolypeptide differs from the first polypeptide;

wherein the plurality of first assemblies non-covalently interact withthe plurality of second assemblies to form a nanostructure.

As described in the examples that follow, a plurality (2, 3, 4, 5, 6, ormore) of first polypeptides self-assemble to form a first assembly, anda plurality (2, 3, 4, 5, 6, or more) of second polypeptidesself-assemble to form a second assembly. A plurality of these first andsecond assemblies then self-assemble non-covalently via the designedinterfaces to produce the nanostructures of the invention. The designedinterfaces on the polypeptides of the invention, resembling naturalprotein-protein interfaces with well-packed cores composed primarily ofhydrophobic amino acid side chains surrounded by a periphery composedprimarily of hydrophilic and charged side chains, rigidly orient theassemblies within the nanostructures formed by self-assembly. As will beunderstood by those of skill in the art, the interaction between thefirst assembly and the second assembly is a non-covalent protein-proteininteraction. Any suitable non-covalent interaction(s) can driveself-interaction of the assemblies to form the nanostructure, includingbut not limited to one or more of electrostatic interactions,7π-effects, van der Waals forces, hydrogen bonding, and hydrophobiceffects. In various embodiments, pentamers, trimers, and dimers of thefirst or second assemblies assemble relative to each other such thattheir 5-fold, 3-fold, and 2-fold symmetry axes are aligned alongicosahedral 5-fold, 3-fold, and 2-fold symmetry axes, respectively.

In various other embodiments, the nanostructures are between about 20nanometers (nm) to about 40 nm in diameter, with interior lumens betweenabout 15 nm to about 32 nm across and pore sizes in the protein shellsbetween about 1 nm to about 14 nm in their longest dimensions (FIG. 2 ).The nanostructures of the invention can be used for any suitablepurpose, including but not limited to delivery vehicles, as thenanostructures can encapsulate molecules of interest and/or the firstand/or second proteins can be modified to bind to molecules of interest(diagnostics, therapeutics, detectable molecules for imaging and otherapplications, etc.). The nanostructures of the invention are well suitedfor several applications, including vaccine design, targeted delivery oftherapeutics, and bioenergy.

In various embodiments of the nanostructure of the invention, the firstpolypeptides and the second polypeptides comprise polypeptides with theamino acid sequence selected from the following pairs, or modifiedversions thereof (i.e.: permissible modifications as disclosed for thepolypeptides of the invention: isolated polypeptides comprising an aminoacid sequence that is at least 75% identical over its length, andidentical at least at one identified interface position, to the aminoacid sequence indicated by the SEQ ID NO.):

(i) SEQ ID NO:1 and SEQ ID NO:2 (I53-34A and I53-34B);

(ii) SEQ ID NO:3 and SEQ ID NO:4 (I53-40A and I53-40B);

(iii) SEQ ID NO:3 and SEQ ID NO:24 (I53-40A and I53-40B.1);

(iv) SEQ ID NO:23 and SEQ ID NO:4 (I53-40A.1 and I53-40B);

(v) SEQ ID NO:35 and SEQ ID NO:36 (I53-40A genus and I53-40B genus);

(vi) SEQ ID NO:5 and SEQ ID NO:6 (I53-47A and I53-B);

(vii) SEQ ID NO:5 and SEQ ID NO:27 (I53-47A and I53-47B.1);

(viii) SEQ ID NO:5 and SEQ ID NO:28 (I53-47A and I53-47B.1NegT2);

(ix) SEQ ID NO:25 and SEQ ID NO:6 (I53-47A.1 and I53-47B);

(x) SEQ ID NO:25 and SEQ ID NO:27 (I53-47A.1 and I53-47B.1);

(xi) SEQ ID NO:25 and SEQ ID NO:28 (I53-47A.1 and I53-47B.1NegT2);

(xii) SEQ ID NO:26 and SEQ ID NO:6 (I53-47A.1NegT2 and I53-47B);

(xiii) SEQ ID NO:26 and SEQ ID NO:27 (I53-47A.1NegT2 and I53-47B.1);

(xiv) SEQ ID NO:26 and SEQ ID NO:28 (I53-47A.1NegT2 and I53-47B.1NegT2);

(xv) SEQ ID NO:37 and SEQ ID NO:38 (I53-47A genus and I53-47B genus);

(xvi) SEQ ID NO:7 and SEQ ID NO:8 (I53-50A and I53-50B);

(xvii) SEQ ID NO:7 and SEQ ID NO:32 (I53-50A and I53-50B.1);

(xix) SEQ ID NO:7 and SEQ ID NO:33 (I53-50A and I53-50B.1NegT2);

(xx) SEQ ID NO:7 and SEQ ID NO:34 (I53-50A and I53-50B.4PosT1);

(xxi) SEQ ID NO:29 and SEQ ID NO:8 (I53-50A.1 and I53-50B);

(xxii) SEQ ID NO:29 and SEQ ID NO:32 (I53-50A.1 and I53-50B.1);

(xxiii) SEQ ID NO:29 and SEQ ID NO:33 (I53-50A.1 and I53-50B.1NegT2);

(xxiv) SEQ ID NO:29 and SEQ ID NO:34 (I53-50A.1 and I53-50B.4PosT1);

(xxv) SEQ ID NO:30 and SEQ ID NO:8 (I53-50A.1NegT2 and I53-50B);

(xxvi) SEQ ID NO:30 and SEQ ID NO:32 (I53-50A.1NegT2 and I53-50B.1);

(xxvii) SEQ ID NO:30 and SEQ ID NO:33 (I53-50A.1NegT2 andI53-50B.1NegT2);

(xxviii)SEQ ID NO:30 and SEQ ID NO:34 (I53-50A.1NegT2 andI53-50B.4PosT1);

(xxix) SEQ ID NO:31 and SEQ ID NO:8 (I53-50A.1PosT1 and I53-50B);

(xxx) SEQ ID NO:31 and SEQ ID NO:32 (I53-50A.1PosT1 and I53-50B.1);

(xxxi) SEQ ID NO:31 and SEQ ID NO:33 (I53-50A.1PosT1 andI53-50B.1NegT2);

(xxxii) SEQ ID NO:31 and SEQ ID NO:34 (I53-50A.1PosT1 andI53-50B.4PosT1);

(xxxiii) SEQ ID NO:39 and SEQ ID NO:40 (I53-50A genus and I53-50Bgenus);

(xxxiv) SEQ ID NO:9 and SEQ ID NO:10 (I53-51A and I53-51B);

(xxxv) SEQ ID NO:11 and SEQ ID NO:12 (I52-03A and I52-03B);

(xxxvi) SEQ ID NO:13 and SEQ ID NO:14 (I52-32A and I52-32B);

(xxxv) SEQ ID NO:15 and SEQ ID NO:16 (I52-33A and I52-33B)

(xxxvi) SEQ ID NO:17 and SEQ ID NO:18 (I32-06A and I32-06B);

(xxxvii) SEQ ID NO:19 and SEQ ID NO:20 (I32-19A and I32-19B);

(xxxviii) SEQ ID NO:21 and SEQ ID NO:22 (I32-28A and I32-28B); and

(xxxix) SEQ ID NO:23 and SEQ ID NO:24 (I53-40A.1 and I53-40B.1).

In one embodiment, the nanostructure has icosahedral symmetry. In thisembodiment, the nanostructure may comprise 60 copies of the firstpolypeptide and 60 copies of the second polypeptide. In one suchembodiment, the number of identical first polypeptides in each firstassembly is different than the number of identical second polypeptidesin each second assembly. For example, in one embodiment, thenanostructure comprises twelve first assemblies and twenty secondassemblies; in this embodiment, each first assembly may, for example,comprise five copies of the identical first polypeptide, and each secondassembly may, for example, comprise three copies of the identical secondpolypeptide. In another embodiment, the nanostructure comprises twelvefirst assemblies and thirty second assemblies; in this embodiment, eachfirst assembly may, for example, comprise five copies of the identicalfirst polypeptide, and each second assembly may, for example, comprisetwo copies of the identical second polypeptide. In a further embodiment,the nanostructure comprises twenty first assemblies and thirty secondassemblies; in this embodiment, each first assembly may, for example,comprise three copies of the identical first polypeptide, and eachsecond assembly may, for example, comprise two copies of the identicalsecond polypeptide. All of these embodiments are capable of formingsynthetic nanomaterials with regular icosahedral symmetry. In variousfurther embodiments, oligomeric states of the first and secondpolypeptides are as follows:

I53-34A: trimer+I53-34B: pentamer;

I53-40A: pentamer+I53-40B: trimer;

I53-47A: trimer+I53-47B: pentamer;

I53-50A: trimer+I53-50B: pentamer;

I53-51A: trimer+I53-51B: pentamer;

I32-06A: dimer+I32-06B: trimer;

I32-19A: trimer+I32-19B: dimer;

I32-28A: trimer+I32-28B: dimer;

I52-03A: pentamer+I52-03B: dimer;

I52-32A: dimer+I52-32B: pentamer; and

I52-33A: pentamer+I52-33B: dimer.

As disclosed in the examples that follow, the nanostructures formspontaneously when appropriate polypeptide pairs are co-expressed in E.coli cells, yielding milligram quantities of purified material per literof cell culture using standard methods of immobilized metal-affinitychromatography and gel filtration. When a poly-histidine purificationtag is appended to just one of the two distinct polypeptide subunits(i.e.: the first and second polypeptides) comprising each nanostructure,the other subunit is found to co-purify with the tagged subunit.

In one embodiment, the nanostructure further comprises a cargo withinthe nanostructure. As used herein, a “cargo” is any compound or materialthat can be incorporated on and/or within the nanostructure. Forexample, polypeptide pairs suitable for nanostructure self-assembly canbe expressed/purified independently; they can then be mixed in vitro inthe presence of a cargo of interest to produce the nanostructurecomprising a cargo. This feature, combined with the proteinnanostructures' large lumens and relatively small pore sizes, makes themwell suited for the encapsulation of a broad range of cargo including,but not limited to, small molecules, nucleic acids, polymers, and otherproteins. In turn, the protein nanostructures of the present inventioncould be used for many applications in medicine and biotechnology,including targeted drug delivery and vaccine design. For targeted drugdelivery, targeting moieties could be fused or conjugated to the proteinnanostructure exterior to mediate binding and entry into specific cellpopulations and drug molecules could be encapsulated in the cageinterior for release upon entry to the target cell or sub-cellularcompartment. For vaccine design, antigenic epitopes from pathogens couldbe fused or conjugated to the cage exterior to stimulate development ofadaptive immune responses to the displayed epitopes, with adjuvants andother immunomodulatory compounds attached to the exterior and/orencapsulated in the cage interior to help tailor the type of immuneresponse generated for each pathogen. The polypeptide components may bemodified as noted above. In one non-limiting example, the polypeptidescan be modified, such as by introduction of various cysteine residues atdefined positions to facilitate linkage to one or more antigens ofinterest as cargo, and the nanostructure could act as a scaffold toprovide a large number of antigens for delivery as a vaccine to generatean improved immune response. Other modifications of the polypeptides asdiscussed above may also be useful for incorporating cargo into thenanostructure.

In certain embodiments, the nanostructures may comprise one or morepeptides configured to bind or fuse with desired immunogens. In certainfurther embodiments, the nanostructure comprises one or more copies ofvariants designed to form a nanostructure of the trimeric proteins 1WOZor 1WA3 (PDB ID codes), which have been demonstrated to be suitable forfusion with the trimeric HIV immunogen, BG505 SOSIP (4-6). Suchnanostructures could be used as scaffolds for the design of an HIVvaccine capable of inducing protective immune responses against thevirus. In another embodiment, the nanostructures of the presentinvention could be useful as scaffolds for the attachment of enzymes onthe interior and/or exterior of the cages. Such enzymes confer on thenanostructure the ability to catalyze biochemical pathways or otherreactions. Such patterning has been shown to be important in naturalsystems in order to increase local substrate concentrations, sequestertoxic intermediates, and/or reduce the rates of undesirable sidereactions (7, 8). In another embodiment, the cargo may comprise adetectable cargo. For example, the nanostructures of the presentinvention could also be useful as single-cell or single-molecule imagingagents. The materials are large enough to be identified in cells byelectron microscopy, and when tagged with fluorophores they are readilydetectable by light microscopy. This feature makes them well-suited tothe task of correlating images of the same cells taken by lightmicroscopy and electron microscopy.

In another aspect, the present invention provides isolated nucleic acidsencoding a protein of the present invention. The isolated nucleic acidsequence may comprise RNA or DNA. As used herein, “isolated nucleicacids” are those that have been removed from their normal surroundingnucleic acid sequences in the genome or in cDNA sequences. Such isolatednucleic acid sequences may comprise additional sequences useful forpromoting expression and/or purification of the encoded protein,including but not limited to polyA sequences, modified Kozak sequences,and sequences encoding epitope tags, export signals, and secretorysignals, nuclear localization signals, and plasma membrane localizationsignals. It will be apparent to those of skill in the art, based on theteachings herein, what nucleic acid sequences will encode the proteinsof the invention.

In a further aspect, the present invention provides recombinantexpression vectors comprising the isolated nucleic acid of anyembodiment or combination of embodiments of the invention operativelylinked to a suitable control sequence. “Recombinant expression vector”includes vectors that operatively link a nucleic acid coding region orgene to any control sequences capable of effecting expression of thegene product. “Control sequences” operably linked to the nucleic acidsequences of the invention are nucleic acid sequences capable ofeffecting the expression of the nucleic acid molecules. The controlsequences need not be contiguous with the nucleic acid sequences, solong as they function to direct the expression thereof. Thus, forexample, intervening untranslated yet transcribed sequences can bepresent between a promoter sequence and the nucleic acid sequences andthe promoter sequence can still be considered “operably linked” to thecoding sequence. Other such control sequences include, but are notlimited to, polyadenylation signals, termination signals, and ribosomebinding sites. Such expression vectors can be of any type known in theart, including but not limited to plasmid and viral-based expressionvectors. The control sequence used to drive expression of the disclosednucleic acid sequences in a mammalian system may be constitutive (drivenby any of a variety of promoters, including but not limited to, CMV,SV40, RSV, actin, EF) or inducible (driven by any of a number ofinducible promoters including, but not limited to, tetracycline,ecdysone, steroid-responsive). The construction of expression vectorsfor use in transfecting prokaryotic cells is also well known in the art,and thus can be accomplished via standard techniques. (See, for example,Sambrook, Fritsch, and Maniatis, in: Molecular Cloning, A LaboratoryManual, Cold Spring Harbor Laboratory Press, 1989; Gene Transfer andExpression Protocols, pp. 109-128, ed. E. J. Murray, The Humana PressInc., Clifton, N.J.), and the Ambion 1998 Catalog (Ambion, Austin,Tex.). The expression vector must be replicable in the host organismseither as an episome or by integration into host chromosomal DNA. In apreferred embodiment, the expression vector comprises a plasmid.However, the invention is intended to include other expression vectorsthat serve equivalent functions, such as viral vectors.

In another aspect, the present invention provides host cells that havebeen transfected with the recombinant expression vectors disclosedherein, wherein the host cells can be either prokaryotic or eukaryotic.The cells can be transiently or stably transfected. Such transfection ofexpression vectors into prokaryotic and eukaryotic cells can beaccomplished via any technique known in the art, including but notlimited to standard bacterial transformations, calcium phosphateco-precipitation, electroporation, or liposome mediated-, DEAE dextranmediated-, polycationic mediated-, or viral mediated transfection. (See,for example, Molecular Cloning: A Laboratory Manual (Sambrook, et al.,1989, Cold Spring Harbor Laboratory Press; Culture of Animal Cells: AManual of Basic Technique, 2nd Ed. (R. I. Freshney. 1987. Liss, Inc. NewYork, N.Y.). A method of producing a polypeptide according to theinvention is an additional part of the invention. The method comprisesthe steps of (a) culturing a host according to this aspect of theinvention under conditions conducive to the expression of thepolypeptide, and (b) optionally, recovering the expressed polypeptide.

In a further aspect, the present invention provides kits comprising:

(a) one or more of the isolated polypeptides, polypeptide assemblies, ornanostructures of the invention;

(b) one or more recombinant nucleic acids of the invention;

(c) one or more recombinant expression vectors comprising recombinantnucleic acids of the invention; and/or

(d) one or more recombinant host cell, comprising recombinant expressionvectors of the invention.

In yet a further aspect, the present invention provides methods of usingthe nanostructures of the present invention. In cases where bothpolypeptides comprising an assembly are capable of independentexpression and purification, this enables control over assembly throughmixing of purified components in vitro. This feature, combined with thenanostructures' large lumens and relatively small pore sizes, makes themwell suited for the encapsulation of a broad range of other materialsincluding small molecules, nucleic acids, polymers, and other proteins,as discussed above. In turn, the nanostructures of the present inventioncould be used for many applications in medicine and biotechnology,including targeted drug delivery and vaccine design. For targeted drugdelivery, targeting moieties could be fused or conjugated to thenanostructure exterior to mediate binding and entry into specific cellpopulations and drug molecules could be encapsulated in the cageinterior for release upon entry to the target cell or sub-cellularcompartment. For vaccine design, antigenic epitopes from pathogens couldbe fused or conjugated to the nanostructure exterior to stimulatedevelopment of adaptive immune responses to the displayed epitopes, withadjuvants and other immunomodulatory compounds attached to the exteriorand/or encapsulated in the cage interior to help tailor the type ofimmune response generated for each pathogen. Other uses will be clear tothose of skill in the art based on the disclosure relating topolypeptide modifications, nanostructure design, and cargoincorporation.

Examples

Methods of production: The icosahedral materials disclosed herein (aminoacid sequences provided in Table 1), which comprise possible embodimentsof the present invention, were produced as follows. The initialsequences and structures for the design process were derived frompentameric, trimeric, and dimeric crystal structures from the ProteinData Bank (PDB), along with a small number of crystal structures of denovo designed proteins not yet deposited in the PDB.

The PDB Accession numbers for the wild type scaffold proteins related tothe exemplary polypeptides of the invention are as follows:

-   SEQ ID NO:1 (I53-34A): 2yw3;-   SEQ ID NO:2 (I53-34B): 2b98;-   SEQ ID NO:3 (I53-40A): 2b98;-   SEQ ID NO:4 (I53-40B): 4e38;-   SEQ ID NO:5 (I53-47A): 1hfo;-   SEQ ID NO:6 (I53-47B): 2obx;-   SEQ ID NO:7 (I53-50A): 1 wa3;-   SEQ ID NO:8 (I53-50B): 2obx;-   SEQ ID NO:9 (I53-51A): 1woz;-   SEQ ID NO:10 (I53-51B): 2obx;-   SEQ ID NO:11 (I52-03A): 1c41;-   SEQ ID NO:12 (I52-03B): 3bxo;-   SEQ ID NO:13 (I52-32A): 31fh;-   SEQ ID NO:14 (I52-32B): 2jfb;-   SEQ ID NO:15 (I52-33A): 2jfb;-   SEQ ID NO:16 (I52-33B): 3q34;-   SEQ ID NO:17 (I32-06A): 3e7d;-   SEQ ID NO:18 (I32-06B): 1mww;-   SEQ ID NO:19 (I32-19A): 2c5q;-   SEQ ID NO:20 (I32-19B): 2vvp;-   SEQ ID NO:21 (I32-28A): 2zhz; and-   SEQ ID NO:22 (I32-28B): 3nqn.

15,552 pairs of pentamers and trimers, 50,400 pairs of pentamers anddimers, and 344,825 pairs of trimers and dimers were arranged inicosahedral symmetry with the 5-fold symmetry axes of the pentamers,3-fold symmetry axes of the trimers, and 2-fold symmetry axes of thedimers aligned along the 5-fold, 3-fold, and 2-fold icosahedral symmetryaxes, respectively. While maintaining perfect icosahedral symmetry,rotations and translations along these axes were sampled to identifyconfigurations predicted to be suitable for protein-protein interfacedesign. In total, 68,983 153, 35,468 152, and 177,252 132 configurationswere designed, yielding 71 pairs of 153 protein sequences, 44 pairs ofI52 protein sequences, and 68 pairs of I32 protein sequences predictedto fold and assemble into the modeled icosahedral complexes.

Genes encoding the 71 pairs of I53 sequences were synthesized and clonedinto a variant of the pET29b expression vector (Novagen, Inc.) betweenthe NdeI and XhoI endonuclease restriction sites. Genes encoding the 44pairs of I52 sequences and 68 pairs of I32 sequences were synthesizedand cloned into a variant of the pET28b expression vector (Novagen,Inc.) between the NcoI and XhoI endonuclease restriction sites.

The two protein coding regions in each DNA construct are connected by anintergenic region. The intergenic region in the I53 designs was derivedfrom the pETDuet-1 vector (Novagen, Inc.) and includes a stop codon, T7promoter/lac operator, and ribosome binding site.

The intergenic region in the I52 and I32 designs only includes a stopcodon and ribosome binding site. The sequences of the 153, 152 and I32intergenic regions are as follows:

I53 intergenic region DNA sequence:

(SEQ ID NO: 41) 5′- TAATGCTTAAGTCGAACAGAAAGTAATCGTATTGTACACGGCCGCATAATCGAAAT TAATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCATCTTAGTAT ATTAGTTAAGTATAAGAAGGAGATATAC TT-3′

I52 intergenic region DNA sequence:

(SEQ ID NO: 42) 5′-TAAAGAAGGAGATATCAT-3′

I32 intergenic region DNA sequence:

(SEQ ID NO: 43) 5′-TGAGAAGGAGATATCAT-3′

The constructs for the I53 protein pairs thus possess the following setof elements from 5′ to 3′: NdeI restriction site, upstream gene,intergenic region, downstream gene, XhoI restriction site. Theconstructs for the I52 and I32 protein pairs possess the following setof elements from 5′ to 3′: NcoI restriction site, upstream gene,intergenic region, downstream gene, XhoI restriction site. In each case,the upstream genes encode components denoted with the suffix “A”; thedownstream genes encode the “B” components (Table 1). This allows forco-expression of the designed protein pairs in which both the upstreamand downstream genes have their own ribosome binding site, and in thecase of the I53 designs, both genes also have their own T7 promoter/lacoperator.

For purification purposes, each co-expression construct includes a6x-histidine tag (HHHHHH) appended to the N- or C-terminus of one of thetwo protein coding regions.

Expression plasmids were transformed into BL21(DE3) E. coli cells. Cellswere grown in LB medium supplemented with 50 mg L⁻¹ of kanamycin (Sigma)at 37° C. until an OD600 of 0.8 was reached. Protein expression wasinduced by addition of 0.5 mM isopropyl-thio-β-D-galactopyranoside(Sigma) and allowed to proceed for either 5 h at 22° C. or 3 h at 37° C.before cells were harvested by centrifugation.

The designed proteins were first screened for soluble expression andco-purification at small scale from 2 to 4 mL cultures by nickelaffinity chromatography using His MultiTrap® FF nickel-coated filterplates (GE Healthcare). Purification products were analyzed by SDS-PAGEto identify those containing species near the expected molecular weightof both protein subunits (indicating co-purification). Those found tocontain both subunits were subsequently subjected to native(non-denaturing) PAGE to identify slow migrating species furtherindicating assembly to higher order materials. Those designs appearingto co-purify and yielding slowly migrating species by native PAGE weresubsequently expressed at larger scale (1 to 12 liters of culture) andpurified by nickel affinity chromatography via gravity columns withnickel-NTA resin (Qiagen) or HisTrap® HP columns (GE Healthcare).Fractions containing the designed proteins were pooled, concentratedusing centrifugal filter devices (Sartorius Stedim Biotech), and furtherpurified on a Superose® 6 10/300 gel filtration column (GE Healthcare).

The purified proteins were analyzed by size exclusion chromatographyusing a Superose® 6 10/300 column to assess their assembly states. Foreach of the exemplary proteins described here, major peaks were observedin the chromatograms near elution volumes of 8.5 to 12 mL, whichcorrespond well with the expected elution volumes for the designed120-subunit icosahedral nanostructures. Within this set of exemplaryproteins, the relative elution volumes correspond with the physicaldimensions of the computational design models of the nanostructures,that is, proteins designed to assemble into relatively largernanostructures yielded peaks at earlier elution volumes while thosedesigned to assemble into relatively smaller nanostructures yieldedpeaks at later elution volumes. In some cases, smaller secondary peakswere observed at slightly earlier elution volumes than the predominantpeak, suggesting transient or low-affinity dimerization of thenanostructures.

Gel filtration fractions containing pure protein in the desired assemblystate were analyzed by negative stain electron microscopy as describedpreviously (2). Electron micrographs showing fields of particles of theexpected size and shape have been obtained for 10 of the nanostructures.In one case (I32-19), the nanostructure appears to be unstable in theconditions encountered during grid preparation, precluding visualizationby electron microscopy.

To further validate the structures of our materials, small angle X-rayscattering (SAXS) data was obtained for several of the designednanostructures. Scattering measurements were performed at the SIBYLS®12.3.1 beamline at the Advanced Light Source, LBNL, on 20 microlitersamples loaded into a helium-purged sample chamber (10). Data werecollected on gel filtration fractions and samples concentrated ˜2x-10xfrom individual fractions, with the gel filtration buffer andconcentrator eluates used for buffer subtraction. Sequential exposuresranging from 0.5 to 5 seconds were taken at 12 keV to maximize signal tonoise, with visual checks for radiation-induced damage to the protein.The FOXS® algorithm (11, 12) was then used to calculate scatteringprofiles from our design models and fit them to the experimental data.The major features of the I53-34, I53-40, I53-47, I53-50, I52-03,I52-32, I52-33, I32-06, I32-19, and I32-28 design models were all foundto match well with the experimental data, supporting the conclusion thatthe nanostructures assemble to the intended assembly state andthree-dimensional configuration in solution. Graphs of the log of thescattering intensity, I(q), as a function of scattering angle, q, showmultiple large dips in the scattering intensity in the low q regionbetween 0.015 A⁻¹ and 0.15 A⁻¹, each of which is closely recapitulatedin the theoretical profiles calculated from the design models. Althoughthe I53-51 design model was not found to match well with the SAXS data,this appears likely to be due to low stability of the designed material,which caused it to be primarily unassembled at the concentrations usedfor the SAXS measurements; this result is consistent with our findingsfrom gel filtration of I53-51, in which significant peaks were observedcorresponding to the unassembled pentamers and trimers in addition tothe presumed 120-subunit assembly peak.

Using the Rosetta macromolecular modeling suite, the computationalmodels of designed I53 materials were redesigned by allowingoptimization of the identities of relatively exposed residues (definedas having a solvent accessible surface area of greater than 20 squareÅngstroms), excepting polar residues (Aspartate, Glutamate, Histidine,Lysine, Asparagine, Glutamine, and Arginine) and residues near thedesigned protein-protein interfaces between the pentameric and trimericcomponents. Mutations that resulted in losses of significant atomicpacking interactions or side chain-backbone hydrogen bonds werediscarded. A position-specific scoring matrix (PSSM) based on homologousprotein sequences was used to augment the Rosetta scorefunction to favorresidues that appear frequently at a given position in homologousproteins, a design approach referred to as consensus protein design (9).Multiple design trajectories were performed with varying weights on thecontribution of the PSSM, and mutations to polar residues that appearedfavorable across all design trajectories were selected for inclusion inthe variant protein. These variants were designated by the addition of“0.1” to the end of their names (e.g., I53-50A. 1).

The Rosetta macromolecular modeling suite was used to mutate manuallyselected amino acid positions to charged amino acids in order togenerate variant nanoparticles featuring highly positively or negativelycharged interior surfaces. To generate negatively charged nanoparticles(denoted by the letters “Neg” in their names), mutations were limited toeither Aspartate or Glutamate. To generate positively chargednanoparticles (denoted by the letters “Pos” in their names), mutationswere limited to either Arginine or Lysine. Relevant score metrics foreach mutation were independently assessed, and favorable mutations weresorted into two tiers based on their scores. Two new nanoparticlevariants sequences were then designed for each individual protein foreach type of charge, one including only the Tier 1 mutations (named“T1”) and the other including both the Tier 1 and Tier 2 mutations(named “T2”). In most cases, the charged mutations were incorporatedinto the consensus redesign variants described above.

Genes encoding the I53 “0.1” and charged variant proteins weresynthesized and cloned into the pET29b expression vector (Novagen, Inc.)between the NdeI and XhoI endonuclease restriction sites. Constructswere produced in two formats. In the first, the two proteins wereencoded in a bicistronic arrangement on a single expression plasmid asdescribed above for co-expression in E. coli. In the second, eachprotein component (i.e., the pentameric component and the trimericcomponent) were cloned individually into pET29b for expression in theabsence of the other component.

For purification purposes, each co-expression construct included a6x-histidine tag (HHHHHH) appended to the N- or C-terminus of one of thetwo protein coding regions. Similarly, each individual expressionconstruct included a 6x-histidine tag appended to the N- or C-terminusof the protein coding region.

The “0.1” and charged variant proteins were expressed and purified asdescribed above with two differences. First, expression at 18° C. wasevaluated in addition to expression at 37° C. at small scale for allvariants, and, in some cases, expression at 18° C. was used to producethe proteins at multi-liter scale. Second, for some variants, thedetergent 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate(CHAPS) was included in all purification buffers at a concentration of0.75% weight/volume to prevent protein aggregation.

After purification of individually expressed protein components, pairsof components designed to co-assemble into a nanoparticle (e.g.,I53-40.1A and I53-40.1B) were mixed in equimolar amounts in buffer andallowed to incubate at room temperature for 1-24 hours, a procedure werefer to as “in vitro assembly.” For assemblies including chargedcomponents, the buffer included 500 mM NaCl; in all other cases thebuffer included 150 mM NaCl. The mixtures were fractionated and analyzedon a Superose® 6 10/300 gel filtration column (GE Healthcare), andfractions were analyzed by SDS-PAGE to determine the protein contents ofeach elution peak.

In one exemplary embodiment, the I53-40.1A and I53-40.1B proteinvariants, based off of I53-40A and I53-40B, respectively, wereconstructed by consensus protein design, in which multiple sequencealignments from protein families related to each protein subunit wereused to guide the selection of amino acid residues at surface-exposedpositions. The variant proteins were found to be more stable and solublewhen purified independently than the original proteins, a property thatenabled the formation of the designed nanostructure by simply mixingsolutions containing the purified components in physiological buffers ina 1:1 molar ratio. The addition of 0.75% CHAPS, a zwitterionicdetergent, to the buffer was found to further increase the stability andsolubility of I53-40.1A and was therefore included during thepurification of the protein prior to in vitro assembly. Size exclusionchromatograms from a run analyzing the mixed solution containing bothcomponents on a Superose 6 column revealed a single major peak at theelution volume expected for the 120-subunit designed icosahedralnanostructure. Analysis of the peak fractions by SDS-PAGE revealed bandsat the expected molecular weight for the first and second polypeptidesof the nanostructure in an apparent 1:1 stoichiometric ratio. The datademonstrate that when mixed, the two components co-assemble to the120-subunit designed icosahedral nanostructure.

In another exemplary embodiment, the I53-47A.1, I53-47B.1, I53-50A.1,and I53-50B.1 protein variants, based off of I53-47A, I53-47B, I53-50A,and I53-50B, respectively, were constructed by consensus protein design,in which multiple sequence alignments from protein families related toeach protein subunit were used to guide the selection of amino acidresidues at surface-exposed positions. The variant proteins were foundto be more stable and soluble when purified independently than theoriginal proteins, a property that enabled the formation of the designednanostructure by simply mixing solutions containing the purifiedcomponents in physiological buffers in a 1:1 molar ratio, a processreferred to as in vitro assembly. The addition of 0.75% CHAPS, azwitterionic detergent, to the buffer was found to further increase thestability and solubility of I53-47B.1 and I53-50B.1 and was thereforeincluded during the purification of the proteins prior to in vitroassembly. Size exclusion chromatograms from a run analyzing the mixedsolution containing both I53-47A.1 and I53-47B.1 on a Superose 6 columnrevealed a major peak at the elution volume expected for the 120-subunitdesigned icosahedral nanostructure as well as a smaller secondary peakat a later elution volume. Analysis of the peak fractions correspondingto the 120-subunit nanostructure by SDS-PAGE revealed bands at theexpected molecular weight for the first and second polypeptides of thenanostructure in an apparent 1:1 stoichiometric ratio. Analysis of thesecondary peak at the later elution volume revealed that this peakcomprises only the trimeric subunit, suggesting that the in vitroassembly mixture actually contained an excess of this polypeptide.Similarly, size exclusion chromatograms from a run analyzing the mixedsolution containing both I53-50A.1 and I53-50B.1 on a Superose 6 columnrevealed a peak at the elution volume expected for the 120-subunitdesigned icosahedral nanostructure as well as two secondary peaks atlater elution volumes. Analysis of the peak fractions corresponding tothe 120-subunit nanostructure by SDS-PAGE revealed bands at the expectedmolecular weight for the first and second polypeptides of thenanostructure in an apparent 1:1 stoichiometric ratio. Analysis of thesecondary peaks at the later elution volumes revealed that the first ofthe two comprises only the pentameric subunit, while the second of thetwo comprises only the trimeric subunit, suggesting that for this pairof proteins, in vitro assembly is somewhat inefficient. Together, thedata demonstrate that when mixed, the two components of eachnanostructure (i.e., I53-47A.1 and I53-47B.1 or I53-50A.1 and I53-50B.1)co-assemble to the 120-subunit designed icosahedral nanostructures.

In another exemplary embodiment, the protein variants I53-47A.1NegT2,I53-47B.1NegT2, I53-50A.1NegT2, and I53-50B.1NegT2, based off ofI53-47A.1, I53-47B.1, I53-50A.1, and I53-50B.1, respectively, bearmutations that introduce additional negatively charged amino acidresidues (i.e., Aspartate and Glutamate) on their surfaces such that thenanostructures formed through the assembly of these proteins have highlycharged interior surfaces. After the two independently purified proteinsI53-47A.1NegT2 and I53-47B.1NegT2 were mixed together in an in vitroassembly reaction in a buffer with a concentration of 150 mM NaCl, noassembly was observed when the mixture was analyzed on a Superose 6 sizeexclusion chromatography column; only unassembled I53-47A.1NegT2 andI53-47B.1NegT2 proteins eluted from the column. In contrast, if the invitro assembly reaction was performed in the presence of 0.5 M NaCl,robust assembly to the designed nanostructure was observed, with someremaining unassembled proteins eluting later as smaller secondaryelution peaks. Similarly, after the two independently purified proteinsI53-50A.1NegT2 and I53-50B.1NegT2 were mixed together in an in vitroassembly reaction in a buffer with a concentration of 150 mM NaCl, noassembly was observed when the mixture was analyzed on a Superose® 6size exclusion chromatography column; only unassembled I53-50A.1NegT2and I53-50B.1NegT2 proteins eluted from the column. In contrast, if thein vitro assembly reaction was performed in the presence of 0.5 M NaCl,assembly to the designed nanostructure was observed, with some remainingunassembled proteins eluting later. Together, the data demonstrate thatwhen mixed, the two components of each highly charged 120-subunitdesigned icosahedral nanostructure assemble to the target structure onlyin the presence of high ionic strength.

In order to package nucleic acids, pairs of individually purifiedprotein components designed to co-assemble into a nanoparticle werecombined with single-stranded DNA (ssDNA) in buffer and allowed toincubate overnight. ssDNA was present at a final concentration of 26ng/μL (200 pM) for 400 nucleotide (nt) strands, and 35.2 ng/μL (66.7 pM)for 1600 nt strands. Individual protein components were added at finalequimolar concentrations ranging from 2-12 and the final NaClconcentration was 150 mM. After overnight incubation, samples wereeither analyzed by electrophoresis on a 1% agarose gel or DNase I wasadded to a final concentration of 25 pg/mL and incubated for one hour atroom temperature before electrophoresis. Gels were stained withSybrGold® (ThermoFisher Scientific) and imaged to visualize nucleicacid, and were subsequently stained with GelCode® Blue (Pierce) andimaged again to visualize protein.

The above definitions and explanations are meant and intended to becontrolling in any future construction unless clearly and unambiguouslymodified in the following examples or when application of the meaningrenders any construction meaningless or essentially meaningless. Incases where the construction of the term would render it meaningless oressentially meaningless, the definition should be taken from Webster'sDictionary, 3rd Edition or a dictionary known to those of skill in theart, such as the Oxford Dictionary of Biochemistry and Molecular Biology(Ed. Anthony Smith, Oxford University Press, Oxford, 2004).

The above description provides specific details for a thoroughunderstanding of, and enabling description for, embodiments of thedisclosure. However, one skilled in the art will understand that thedisclosure may be practiced without these details. In other instances,well-known structures and functions have not been shown or described indetail to avoid unnecessarily obscuring the description of theembodiments of the disclosure. The description of embodiments of thedisclosure is not intended to be exhaustive or to limit the disclosureto the precise form disclosed. While specific embodiments of, andexamples for, the disclosure are described herein for illustrativepurposes, various equivalent modifications are possible within the scopeof the disclosure, as those skilled in the relevant art will recognize.

Aspects of the disclosure can be modified, if necessary, to employ thesystems, functions and concepts of the above references and applicationto provide yet further embodiments of the disclosure. These and otherchanges can be made to the disclosure in light of the detaileddescription.

Specific elements of any of the foregoing embodiments can be combined orsubstituted for elements in other embodiments. Furthermore, whileadvantages associated with certain embodiments of the disclosure havebeen described in the context of these embodiments, other embodimentsmay also exhibit such advantages, and not all embodiments neednecessarily exhibit such advantages to fall within the scope of thedisclosure.

The above detailed description describes various features and functionsof the disclosed systems, devices, and methods with reference to theaccompanying figures. In the figures, similar symbols typically identifysimilar components, unless context dictates otherwise. The illustrativeembodiments described in the detailed description, figures, and claimsare not meant to be limiting. Other embodiments can be utilized, andother changes can be made, without departing from the spirit or scope ofthe subject matter presented herein. It will be readily understood thatthe aspects of the present disclosure, as generally described herein,and illustrated in the figures, can be arranged, substituted, combined,separated, and designed in a wide variety of different configurations,all of which are explicitly contemplated herein.

Numerous modifications and variations of the present disclosure arepossible in light of the above teachings. Unless otherwise indicated,all numbers expressing quantities of ingredients, properties such asmolecular weight, reaction conditions, and so forth used in thespecification and claims are to be understood as being modified in allinstances by the term “about.” Notwithstanding that the numerical rangesand parameters setting forth the broad scope of the invention areapproximations, the numerical values set forth in the specific examplesare reported as precisely as possible. Any numerical value, however,inherently contains certain errors necessarily resulting from thestandard deviation found in their respective testing measurements.Groupings of alternative elements or embodiments of the inventiondisclosed herein are not to be construed as limitations. Each groupmember may be referred to and claimed individually or in any combinationwith other members of the group or other elements found herein. It isanticipated that one or more members of a group may be included in, ordeleted from, a group for reasons of convenience and/or patentability.When any such inclusion or deletion occurs, the specification is deemedto contain the group as modified thus fulfilling the written descriptionof all Markush groups used in the appended claims.

Furthermore, numerous references have been made to patents and printedpublications throughout this specification. Each of the above-citedreferences and printed publications are individually incorporated hereinby reference in their entirety.

It is to be understood that the embodiments of the invention disclosedherein are illustrative of the principles of the present invention.Other modifications that may be employed are within the scope of theinvention. Thus, by way of example, but not of limitation, alternativeconfigurations of the present invention may be utilized in accordancewith the teachings herein.

Accordingly, the present invention is not limited to that precisely asshown and described. The particulars shown herein are by way of exampleand for purposes of illustrative discussion of the preferred embodimentsof the present invention only and are presented in the cause ofproviding what is believed to be the most useful and readily understooddescription of the principles and conceptual aspects of variousembodiments of the invention. In this regard, no attempt is made to showstructural details of the invention in more detail than is necessary forthe fundamental understanding of the invention, the description takenwith the drawings and/or examples making apparent to those skilled inthe art how the several forms of the invention may be embodied inpractice.

REFERENCES

-   1. N. P. King et al., Computational design of self-assembling    protein nanomaterials with atomic level accuracy. Science 336, 1171    (Jun. 1, 2012).-   2. N. P. King et al., Accurate design of co-assembling    multi-component protein nanomaterials. Nature 510, 103 (Jun. 5,    2014).-   3. S. Raman et al., Design of Peptide Nanoparticles Using Simple    Protein Oligomerization Domains. The Open Nanomedicine Journal 2, 15    (2009).-   4. J. P. Julien et al., Crystal structure of a soluble cleaved HIV-1    envelope trimer. Science 342, 1477 (Dec. 20, 2013).-   5. D. Lyumkis et al., Cryo-EM structure of a fully glycosylated    soluble cleaved HIV-1 envelope trimer. Science 342, 1484 (Dec. 20,    2013).-   6. M. Pancera et al., Structure and immune recognition of trimeric    pre-fusion HIV-1 Env. Nature, (Oct. 8, 2014).-   7. T. O. Yeates, C. S. Crowley, S. Tanaka, Bacterial    microcompartment organelles: protein shell structure and evolution.    Annu Rev Biophys 39, 185 (2010).-   8. P. Kumar, M. Singh, S. Karthikeyan, Crystal structure analysis of    icosahedral lumazine synthase from Salmonella typhimurium, an    antibacterial drug target. Acta Crystallogr D Biol Crystallogr 67,    131 (February, 2011).-   9. C. Jäckel, J. D. Bloom, P. Kast, F. H. Arnold, D. Hilvert.    Consensus Protein Design without Phylogenetic Bias J. Mol. Biol.,    399 (2010), pp. 541-546.-   10. Hura, G. L. et al. Robust, high-throughput solution structural    analyses by small angle X-ray scattering (SAXS). Nat. Methods 6,    606-612 (2009).-   11. Schneidman-Duhovny, D., Hammel, M., Tainer, J. A. & Sali, A.    Accurate SAXS Profile Computation and its Assessment by Contrast    Variation Experiments. Biophys. 1 105, 962-974 (2013).-   12. Schneidman-Duhovny, D., Hammel, M. & Sali, A. FoXS: a web server    for rapid computation and fitting of SAXS profiles. Nucleic Acids    Res. 38, W540-W544 (2010).

1.-20. (canceled)
 21. A self-assembling icosahedral proteinnanostructure, comprising: 20 trimeric components having 3-foldsymmetry, each trimeric component comprising 3 first subunits, whereineach first subunit comprises a polypeptide having a polypeptide sequencethat is at least 80% identical to SEQ ID NO: 7 or 29-31; and 12pentameric components having 5-fold symmetry, each pentameric componentcomprising 5 second subunits, wherein each second subunit comprises apolypeptide having a polypeptide sequence that is at least 80% identicalto SEQ ID NO: 8 or 32-34.
 22. The nanostructure of claim 21, wherein thepolypeptide sequence of each first subunit is at least 90% identical toSEQ ID NO: 7 or 29-31; and wherein the polypeptide sequence of eachsecond subunit is at least 90% identical to SEQ ID NO: 8 or 32-34. 23.The nanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is at least 95% identical to SEQ ID NO: 7 or 29-31; andwherein the polypeptide sequence of each second subunit is at least 95%identical to SEQ ID NO: 8 or 32-34.
 24. The nanostructure of claim 23,wherein the trimeric components and the pentameric components form anI53 icosahedral architecture.
 25. The nanostructure of claim 21, whereinthe polypeptide sequence of each first subunit is SEQ ID NO: 7; andwherein the polypeptide sequence of each second subunit is SEQ ID NO: 8.26. The nanostructure of claim 21, wherein the polypeptide sequence ofeach first subunit is SEQ ID NO: 7; and wherein the polypeptide sequenceof each second subunit is SEQ ID NO:
 32. 27. The nanostructure of claim21, wherein the polypeptide sequence of each first subunit is SEQ ID NO:7; and wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 33. 28. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is SEQ ID NO: 7; and wherein thepolypeptide sequence of each second subunit is SEQ ID NO:
 34. 29. Thenanostructure of claim 28, wherein the trimeric components and thepentameric components form an I53 icosahedral architecture.
 30. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is SEQ ID NO: 29; and wherein the polypeptide sequence ofeach second subunit is SEQ ID NO:
 8. 31. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 29;and wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 32. 32. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is SEQ ID NO: 29; and wherein thepolypeptide sequence of each second subunit is SEQ ID NO:
 33. 33. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is SEQ ID NO: 29; and wherein the polypeptide sequence ofeach second subunit is SEQ ID NO:
 34. 34. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 30;and wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 8. 35. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is SEQ ID NO: 30; and wherein thepolypeptide sequence of each second subunit is SEQ ID NO:
 32. 36. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is SEQ ID NO: 30; and wherein the polypeptide sequence ofeach second subunit is SEQ ID NO:
 33. 37. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 30;and wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 34. 38. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is SEQ ID NO: 31; and wherein thepolypeptide sequence of each second subunit is SEQ ID NO:
 8. 39. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is SEQ ID NO: 31; and wherein the polypeptide sequence ofeach second subunit is SEQ ID NO:
 32. 40. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 31;and wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 33. 41. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is SEQ ID NO: 31; and wherein thepolypeptide sequence of each second subunit is SEQ ID NO:
 34. 42. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is at least 95% identical to SEQ ID NO:
 7. 43. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is at least 95% identical to SEQ ID NO:
 29. 44. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is at least 95% identical to SEQ ID NO:
 30. 45. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is at least 95% identical to SEQ ID NO:
 31. 46. Thenanostructure of claim 21, wherein the polypeptide sequence of eachsecond subunit is at least 95% identical to SEQ ID NO:
 8. 47. Thenanostructure of claim 21, wherein the polypeptide sequence of eachsecond subunit is at least 95% identical to SEQ ID NO:
 32. 48. Thenanostructure of claim 21, wherein the polypeptide sequence of eachsecond subunit is at least 95% identical to SEQ ID NO:
 33. 49. Thenanostructure of claim 21, wherein the polypeptide sequence of eachsecond subunit is at least 95% identical to SEQ ID NO:
 34. 50. Thenanostructure of claim 21, wherein the polypeptide sequence of eachfirst subunit is SEQ ID NO:
 7. 51. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 29.52. The nanostructure of claim 21, wherein the polypeptide sequence ofeach first subunit is SEQ ID NO:
 30. 53. The nanostructure of claim 21,wherein the polypeptide sequence of each first subunit is SEQ ID NO: 31.54. The nanostructure of claim 21, wherein the polypeptide sequence ofeach second subunit is SEQ ID NO:
 8. 55. The nanostructure of claim 21,wherein the polypeptide sequence of each second subunit is SEQ ID NO:32.
 56. The nanostructure of claim 21, wherein the polypeptide sequenceof each second subunit is SEQ ID NO:
 33. 57. The nanostructure of claim21, wherein the polypeptide sequence of each second subunit is SEQ IDNO:
 34. 58. The nanostructure of claim 21, wherein the polypeptidesequence of each first subunit is identical to SEQ ID NO: 7 or 29-31 atat least 3 of interface amino acid positions 25, 29, 33, 54, and 57; andwherein the polypeptide sequence of each second subunit is identical toSEQ ID NO: 8 or 32-34 at at least 3 of interface amino acid positions24, 28, 36, 124, 125, 127, 128, 129, 131, 132, 133, 135, and
 139. 59.The nanostructure of claim 23, wherein the polypeptide sequence of eachfirst subunit is identical to SEQ ID NO: 7 or 29-31 at at least 3 ofinterface amino acid positions 25, 29, 33, 54, and 57; and wherein thepolypeptide sequence of each second subunit is identical to SEQ ID NO: 8or 32-34 at at least 3 of interface amino acid positions 24, 28, 36,124, 125, 127, 128, 129, 131, 132, 133, 135, and
 139. 60. Thenanostructure of claim 24, wherein the polypeptide sequence of eachfirst subunit is identical to SEQ ID NO: 7 or 29-31 at at least 3 ofinterface amino acid positions 25, 29, 33, 54, and 57; and wherein thepolypeptide sequence of each second subunit is identical to SEQ ID NO: 8or 32-34 at at least 3 of interface amino acid positions 24, 28, 36,124, 125, 127, 128, 129, 131, 132, 133, 135, and 139.